This page is OBSOLETE. See the Troubleshooting section in the HBase book (http://hbase.apache.org/book.html#trouble)

Contents

  1. Problem: Master initializes, but Region Servers do not

  2. Problem: Created Root Directory for HBase through Hadoop DFS

  3. Problem: On migration, no files in root directory

  4. Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"

  5. Problem: "No live nodes contain current block"

  6. Problem: DFS instability and/or regionserver lease timeouts

  7. Problem: Instability on Amazon EC2

  8. Problem: Zookeeper SessionExpired events

  9. Problem: Could not find my address: xyz in list of ZooKeeper quorum servers

  10. Problem: Zookeeper does not seem to work on Amazon EC2

  11. Problem: General operating environment issues -- zookeeper session timeouts, regionservers shutting down, etc.

  12. Problem: Scanner performance is low

  13. Problem: My shell or client application throws lots of scary exceptions during normal operation

  14. Problem: Running a Scan or a MapReduce job over a full table fails with "xceiverCount xx exceeds..." or OutOfMemoryErrors in the HDFS datanodes

  15. Problem: System instability, and the presence of "java.lang.OutOfMemoryError: unable to create new native thread" exceptions in HDFS datanode logs or that of any system daemon

1. Problem: Master initializes, but Region Servers do not

Causes

Resolution

2. Problem: Created Root Directory for HBase through Hadoop DFS

Causes

Resolution

3. Problem: On startup, Master says that you need to run the hbase migrations script

Causes

Resolution

4. Problem: "xceiverCount 258 exceeds the limit of concurrent xcievers 256"

5. Problem: "No live nodes contain current block"

6. Problem: DFS instability and/or regionserver lease timeouts

2009-02-24 10:01:33,516 WARN org.apache.hadoop.hbase.util.Sleeper: We slept xxx ms, ten times longer than scheduled: 10000
2009-02-24 10:01:33,516 WARN org.apache.hadoop.hbase.util.Sleeper: We slept xxx ms, ten times longer than scheduled: 15000
2009-02-24 10:01:36,472 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: unable to report to master for xxx milliseconds - retrying

Causes

Resolution

7. Problem: Instability on Amazon EC2

8. Problem: ZooKeeper SessionExpired events

WARN org.apache.zookeeper.ClientCnxn: Exception
closing session 0x278bd16a96000f to sun.nio.ch.SelectionKeyImpl@355811ec
java.io.IOException: TIMED OUT
       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:906)
WARN org.apache.hadoop.hbase.util.Sleeper: We slept 79410ms, ten times longer than scheduled: 5000
INFO org.apache.zookeeper.ClientCnxn: Attempting connection to server hostname/IP:PORT
INFO org.apache.zookeeper.ClientCnxn: Priming connection to java.nio.channels.SocketChannel[connected local=/IP:PORT remote=hostname/IP:PORT]
INFO org.apache.zookeeper.ClientCnxn: Server connection successful
WARN org.apache.zookeeper.ClientCnxn: Exception closing session 0x278bd16a96000d to sun.nio.ch.SelectionKeyImpl@3544d65e
java.io.IOException: Session Expired
       at org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
       at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
       at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: ZooKeeper session expired

Causes

Resolution

  <property>
    <name>zookeeper.session.timeout</name>
    <value>1200000</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.tickTime</name>
    <value>6000</value>
  </property>

9. Problem: Could not find my address: xyz in list of ZooKeeper quorum servers

Causes

Resolution

10. Problem: Zookeeper does not seem to work on Amazon EC2

  2009-10-19 11:52:27,030 INFO org.apache.zookeeper.ClientCnxn: Attempting
  connection to server ec2-174-129-15-236.compute-1.amazonaws.com/10.244.9.171:2181
  2009-10-19 11:52:27,032 WARN org.apache.zookeeper.ClientCnxn: Exception
  closing session 0x0 to sun.nio.ch.SelectionKeyImpl@656dc861
  java.net.ConnectException: Connection refused

Causes

Resolution

11. Problem: General operating environment issues -- zookeeper session timeouts, regionservers shutting down, etc

Causes

Resolution

See the ZooKeeper Operating Environment Troubleshooting page. It has suggestions and tools for checking disk and networking performance; i.e. the operating environment your zookeeper and hbase are running in. ZooKeeper is the cluster's "canary". It'll be the first to notice issues if any so making sure its happy is the short-cut to a humming cluster.

12. Problem: Scanner performance is low

Causes

Default scanner caching (prefetching) is set to 1. The default is low because if a job takes too long processing, a scanner can time out, which causes unhappy jobs/people/emails. See item #10 above.

Resolution

13. Problem: My shell or client application throws lots of scary exceptions during normal operation

Causes

Since 0.20.0 the default log level for org.apache.hadoop.hbase.* is DEBUG.

Resolution

On your clients, edit $HBASE_HOME/conf/log4j.properties and change this: log4j.logger.org.apache.hadoop.hbase=DEBUG to this: log4j.logger.org.apache.hadoop.hbase=INFO, or even log4j.logger.org.apache.hadoop.hbase=WARN .

14. Problem: Running a Scan or a MapReduce job over a full table fails with "xceiverCount xx exceeds..." or OutOfMemoryErrors in the HDFS datanodes

Causes

This problem is generally a symptom of a mis-configured or underpowered cluster.

Resolution

15. Problem: System instability, and the presence of "java.lang.OutOfMemoryError: unable to create new native thread in exceptions" HDFS datanode logs or that of any system daemon

Causes

The user under which the daemons are running has an nproc limit (default) set too low. The default on recent Linux distributions is 1024.

Resolution

See the HBase book http://hbase.apache.org/book.html on nproc configuration.

Hbase/Troubleshooting (last edited 2011-05-12 20:44:17 by DougMeil)