Migrating an Hbase instance to a new cluster

Note: these steps were performed on a trivially small Hbase instance to start with, results on the larger "real" migration forthcoming

Update: the large(r) Hbase instance worked just as well, roughly 350GB on HDFS. Not large by Hadoop standards, but non-trivial nonetheless.

This is a Hbase users account of migrating data from one data center to another. The steps below were taken to successfully (and non-destructively) migrate a complete and working Hbase instance to a brand new, clean cluster in a different location (no shared hardware). The initial setup was as follows:

The steps:

  1. Get existing Hbase into a stable state
    1. Disable access to the existing Hbase, so that no data is changing (optionally disabling your tables just to be sure)
    2. Major compact all tables
    3. Flush all tables
    4. Shut down Hbase
  2. Get existing HDFS into a stable state
    1. Disable any other HDFS access besides Hbase which should now be shut down
    2. Enter HDFS safe-mode
    3. Run fsck and verify that everything is ok

  3. Push Hbase data from the old cluster to the new one
    1. Use the distcp command to copy your Hbase root directory to the new cluster (this command uses Map/Reduce, so it should be available where you run this command)

    2. Note: when I tried copying the root of the entire HDFS tree I got a NPE, but pushing a top-level directory worked fine

  4. Verify data in the new HDFS cluster
    1. I did a spot-check of file sizes and directories to my satisfaction -- depending on how important your data is you may want some kind of checksumming crawler for verification
    2. Run fsck to make sure HDFS is happy and does not think anything is wrong

  5. Fire up Hbase in the new cluster
    1. Make sure your configuration points to the new Hbase root directory in the new cluster and your new Zookeeper instance

    2. Enable all your tables if you disabled them originally
  6. Verify Hbase connectivity, and again to your satisfaction verify the data accessible through your new Hbase instance
  7. Finito!

I specifically did not attempt to migrate any Zookeeper data, and found that I did not need to -- I did not encounter any problems in skipping this but your mileage may vary.

If something goes wrong

Your old cluster should still be intact, though in read-only mode. To bring it back online, make sure HDFS is out of safe mode (if you put it there), then ensure Hbase is running and your tables are enabled.

Hbase/MigrationToNewCluster (last edited 2010-07-20 19:32:51 by NatHarward)