- 浏览: 135687 次
- 性别:
- 来自: 杭州
文章分类
最新评论
-
SINCE1978:
还没细看,似乎取材自一本书《scala与clojure设计模式 ...
Scala设计模式 -
HowKeyond:
补充,我代码中监听了session失效事件,并做了重连,但重连 ...
ZK Timeout再讨论 -
HowKeyond:
请问这一般是什么原因引起的呢?怎么解决?我被这个问题困扰几个星 ...
ZK Timeout再讨论 -
chenchao051:
耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 耶 ...
回答一位网友对Scala的提问 -
dogstar:
唉.唉.唉.唉.唉.唉.唉.唉.唉.唉.唉.唉.唉.唉.唉.
回答一位网友对Scala的提问
一、YouAreDeadException
FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server serverName=cloud13,60020,1348890729197, load=(requests=0, regions=375, usedHeap=2455, maxHeap=6035): Unhandled exception: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing cloud13,60020,1348890729197 as dead server org.apache.hadoop.hbase.YouAreDeadException: org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently processing cloud13,60020,1348890729197 as dead server at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79) at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:734) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:595) at java.lang.Thread.run(Thread.java:722)
再看一段YouAreDeadException的注释
/** * This exception is thrown by the master when a region server reports and is * already being processed as dead. This can happen when a region server loses * its session but didn't figure it yet. */
很明显,这个是由于session超时引起的,譬如说超时时间是30s,结果30s内没有和服务器取得联系,那么服务器就会认定这个rs超时,等rs再次连接的时候,就会出现这个异常。这个问题极有可能是由于GC引起的,请留意GC日志。
--------------------------------------------------分割线------------------------------------------------------------
二、Got error for OP_READ_BLOCK
2012-10-09 02:22:41,788 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /10.0.1.170:50010 for file /hbase/pp_mac_all/784dcfc3fa060b66402a242080f5cd91/nf/5190449121954817199 for block blk_5558099265298248729_681382:java.io.IOException: Got error for OP_READ_BLOCK, self=/10.0.1.170:23458, remote=/10.0.1.170:50010, for file /hbase/pp_mac_all/784dcfc3fa060b66402a242080f5cd91/nf/5190449121954817199, for block 5558099265298248729_681382 at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1476) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:1992) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2066) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2066) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:101) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:113) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094) at org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036) at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.loadBlock(HFile.java:1442) at org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1299) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:136) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96) at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1351) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2284) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1135) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1127) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1111) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3009) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2911) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1661) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2551) at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
一般看到这个异常,都没什么关系。
这个问题是由于读hdfs中的block的时候出的问题,看DFSClient中的一段代码:
while (true) { // cached block locations may have been updated by chooseDataNode() // or fetchBlockAt(). Always get the latest list of locations at the // start of the loop. block = getBlockAt(block.getStartOffset(), false); DNAddrPair retval = chooseDataNode(block); DatanodeInfo chosenNode = retval.info; InetSocketAddress targetAddr = retval.addr; BlockReader reader = null; int len = (int) (end - start + 1); try { Token<BlockTokenIdentifier> accessToken = block.getBlockToken(); // first try reading the block locally. if (shouldTryShortCircuitRead(targetAddr)) { try { reader = getLocalBlockReader(conf, src, block.getBlock(), accessToken, chosenNode, DFSClient.this.socketTimeout, start); } catch (AccessControlException ex) { LOG.warn("Short circuit access failed ", ex); //Disable short circuit reads shortCircuitLocalReads = false; continue; } } else { // go to the datanode dn = socketFactory.createSocket(); NetUtils.connect(dn, targetAddr, socketTimeout); dn.setSoTimeout(socketTimeout); reader = BlockReader.newBlockReader(dn, src, block.getBlock().getBlockId(), accessToken, block.getBlock().getGenerationStamp(), start, len, buffersize, verifyChecksum, clientName); } int nread = reader.readAll(buf, offset, len); if (nread != len) { throw new IOException("truncated return from reader.read(): " + "excpected " + len + ", got " + nread); } return; } catch (ChecksumException e) { LOG.warn("fetchBlockByteRange(). Got a checksum exception for " + src + " at " + block.getBlock() + ":" + e.getPos() + " from " + chosenNode.getName()); reportChecksumFailure(src, block.getBlock(), chosenNode); } catch (IOException e) { if (refetchToken > 0 && tokenRefetchNeeded(e, targetAddr)) { refetchToken--; fetchBlockAt(block.getStartOffset()); continue; } else { LOG.warn("Failed to connect to " + targetAddr + " for file " + src + " for block " + block.getBlock() + ":" + e); if (LOG.isDebugEnabled()) { LOG.debug("Connection failure ", e); } } } finally { IOUtils.closeStream(reader); IOUtils.closeSocket(dn); } // Put chosen node into dead list, continue addToDeadNodes(chosenNode); }
以上代码结合异常信息,可以得出hdfs在读block时出了问题,
OP_READ_BLOCK 是读数据块的操作,最后一句addToDeadNodes(chosenNode)并不是说将这个DataNode直接加到deadlist中,而只是在这次操作中不会再去使用这个dn。
可以看一下这段注释:
/** * This variable tracks the number of failures since the start of the * most recent user-facing operation. That is to say, it should be reset * whenever the user makes a call on this stream, and if at any point * during the retry logic, the failure count exceeds a threshold, * the errors will be thrown back to the operation. * * Specifically this counts the number of times the client has gone * back to the namenode to get a new list of block locations, and is * capped at maxBlockAcquireFailures */ private int failures = 0;
发表评论
-
简单HBase笔记
2012-10-26 16:35 1904一、Client-side write buffe ... -
诡异的超长时间GC问题定位
2012-10-19 16:45 4316HBase的GC策略采用PawNew+CMS, 这是大众化的配 ... -
ZK Timeout再讨论
2012-10-18 15:29 29696http://crazyjvm.iteye.com/blog/ ... -
HBase集群中的某几台rs挂掉后导致整个集群挂掉的案例
2012-10-10 09:35 0集群规模(小型):13dn 6rs 现象:2台rs在很短 ... -
HBase日志中的异常记录1
2012-10-09 10:49 2晕菜了,这狗屁编辑器把我的格式全弄没了...mlgbd! 异 ... -
zookeeper超时--minSessionTimeout与maxSessionTimeout
2012-10-08 16:55 11008很多同学可能有这样的疑问,我明明把连接zk客户端的超时 ... -
HBase备份与还原
2012-09-18 13:53 2738转载两篇相关文章: 第一篇、http://blog.nosq ... -
Thrift安装中出现的问题(For HBase)
2012-09-06 10:55 1841安装巨简单: 进入thrif ... -
hadoop 0.20.203 数据迁移至 cdh3u3
2012-08-29 08:40 1443假如用hadoop0.20.203版本,hbase会提 ... -
HBase Backup Options
2012-08-23 15:24 1271If you are thinking about using ... -
HBase LRU源码解析
2012-08-13 14:52 2405先来看下LruBlockCache的构造,关键是看清每个参数的 ... -
HBase RegionServer挂掉后的源码分析
2012-08-13 11:20 4076首先肯定是HMaster通过ZK发现某RS挂掉了,HMaste ... -
HBase架构简介
2012-08-06 10:47 1110HBase的架构简介,有兴趣的可以看看。
相关推荐
java 利用 sping-data-hadoop HbaseTemplate 操作hbase find get execute 等方法 可以直接运行
伪分布式的Hadoop+Hive+HBase搭建记录[收集].pdf
搭建pinpoint需要的hbase初始化脚本hbase-create.hbase
1. HBase有哪些基本的特征? 1 HBase特征: 1 2. HBase相对于关系数据库能解决的问题是什么? 2 HBase与关系数据的区别? 2 HBase与RDBMS的区别? 2 3. HBase的数据模式是怎么样的?即有哪些元素?如何存储?等 3 1...
hbase zk异常启动不了
HBase(hbase-2.4.9-bin.tar.gz)是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System...
HBase开发实战,HBase学习利器:HBase实战
HBase 官方文档.pdf HBase的操作和编程.pdf HBase Cpressr优化与实验 郭磊涛.pdf null【HBase】Data Migratin frm Gri t Clu Cmputing - Natural Sienes .pdf 分布式数据库HBase快照的设计与实现.pdf 【HBase】...
HBase shell的基本用法
hbase和hive常用命令总结
hbase-sdk是基于hbase-client和hbase-thrift的原生API封装的一款轻量级的HBase ORM框架。 针对HBase各版本API(1.x~2.x)间的差异,在其上剥离出了一层统一的抽象。并提供了以类SQL的方式来读写HBase表中的数据。对...
HBase开启审计日志
HBASE
作为google bigtable架构的开源实现,hbase能够支持数以十亿计的记录数和数以百万计的字段,与此同时它还能够保证常量级的读写性能。很多it管理层正在质疑hbase。而这本书提供了很多有意义的答案,无论你是否正在...
1. 请用java集合的代码描述HBase的表结构 2. 请简述HBase中数据写入最后导致Region分裂的全过程 3. 如果设计一个笔记的表,表中要求有笔记的属性和笔记的内容,怎么做 4. HBase部署时如何指定多个zookeeper 5. HBase...
注意:zookeeper3.4.13和hbase2.3.5都是采用docker-compose方式部署 原文链接:https://blog.csdn.net/m0_37814112/article/details/120915194 说明:使用外部zookeeper3.4.13之hbase2.3.5一键部署工具,支持部署、...
HBase3.0参考指南 This is the official reference guide for the HBase version it ships with. Herein you will find either the definitive documentation on an HBase topic as of its standing when the ...
hbase 资源合集 hbase 企业应用开发实战 权威指南 hbase 实战 hbase 应用架构
Hbase shell 、Hbase api、Hbase 配置
本文中所开发的系统分为告警记录显示、告警记录查询、告警记录确认、告警记录入库、告警记录删除等模块。本文依照需求分析、概要设计、详细设计的软件开发步骤实现了一个基于HBase的网络告警系统。