Meeting notes - ZooKeeper use in HBase

10/23/2009 phunt, mahadev, jdcryans, stack

Reviewed issues identified by HBase team. Also discussed how they might better troubleshoot when problems occur. Mentioned that they (hbase) should feel free to ping the zk user list if they have questions. JIRAs are great too if they have specific areas of ideas/concerns.

HBase team is going to create some JIRAs they identified (improve logging esp)

Phunt also asked the HBase team to fill in some usecases for how they are currently using zk, this will allow us (zk) to better understand how they are using, allowing us to provide better advice, vet their choices, and in some cases we may even test to verify under real conditions.

Some questions identified:

zk: biggest issues we typically see are; gc causing vm to stall (client or server, we have no control over the jvm unfort, in hadoop i've heard they can see gc pauses of over 60 seconds), server disk io causing stalls (sequential consistency means that reads have to wait for writes to complete), network connectivity problems. Perhaps use of JVM tools such as gc monitoring and jvisualvm may help to track down. Over-provising a host will obviously slow down processing (running high cpu/disk processes on the same box(es) as zookeeper).

also see: https://issues.apache.org/jira/browse/ZOOKEEPER-545

zk: Mahadev mentioned that zk team commonly sees the types of issues hbase experiences at least on first deploy but that after tuning and research, all settles down

zk: this is the easiest to see http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html#Performance however this is for optimum conditions (dedicated server class host with sufficient memory and dedicated spindle for the log) and a large number of clients. With smaller number of clients using synchronous operations you are probably limited more by network latency than anything.

zk: very inexpensive by design. the intent was to support large numbers of watchers. I've tested a single client creating 100k ephemeral nodes on session a, then session b setting sync watches on all nodes, then session c sessing async watches on all nodes, then closing session a. This was all done on a 1core 5yrs old laptop, 200k watches were delivered in < 10 sec. Granted that's all the zk cluster was doing, but it gives you some idea.

Background:

Below is stale now, excerpted from another document. For latest see http://wiki.apache.org/hadoop/Hbase/MasterRewrite

Below are some notes from our master rewrite design doc. Its from the section where we talk of moving all state and schemas to zk:

ZooKeeper/HBaseMeetingNotes (last edited 2009-10-30 17:40:21 by stack)