HDFS Sync Support

Overview

In order to provide durability of edits, HBase requires that your HDFS installation supports the sync call. This call pushes pending data through the HDFS write pipeline and blocks until it has received an acknowledgement from all three nodes in the pipeline. HBase uses this feature when writing edits to its write-ahead log (WAL) so that, if a region server should die, the data may be recovered and replayed on other region servers.

What versions of HDFS support sync?

The necessary feature is available in the 0.20-append branch of HDFS, the unreleased 0.21 branch, and Cloudera's CDH3 release [https://docs.cloudera.com/display/DOC/HBase+Installation ].

*NOTE:* Apache HDFS 0.20 does not support a working sync, even if the dfs.support.append flag is enabled. You *must* use one of the above versions of Hadoop to have durable edits in HBase.

How can I enable sync?

To enable sync, first ensure that you have either compiled the 0.20-append branch from Apache, or installed Cloudera's CDH3. Then ensure that you have set the dfs.support.append flag to true in your hdfs-site.xml both in HDFS's configuration as well as HBase's.