Migration in General
In general, performing a HBase migration will consist of the following steps:
- Shut down the old instance of HBase.
If necessary, upgrade the underlying version of Hadoop to the version required by the new instance of HBase. Refer to the Hadoop Upgrade page.
- Optionally backup your hbase.rootdir.
Download and configure the new instance of HBase. Make sure you configure the hbase.rootdir of the new instance to be the same as that from the old instance.
From the new instance of HBase, perform the HBase migration. Run {$HBASE_HOME}/bin/hbase migrate for usage. See the version-specific notes below for more specific information on this process.
- Start the new instance of HBase.
If you would like to learn more about the HBase Migration design, see HBase Migration
Below are general migration notes followed by specifics on how to migrate between particular versions, newest to oldest.
General Migration Notes
Migration is only supported between the file system version of the previous release and the file system version of the current release. If the existing HBase installation has an older file system version, it will be necessary to install a HBase release which can perform the upgrade, run the migration tool and then install the desired release and run its migration script. (Note that if the existing installation is several versions old, it may be necessary to repeat this process).
Version-Specific Migration Notes
From 0.20.x to 0.90.x (or to 0.89.x)
Nothing special is needed going between the two versions. You should be able to just shutdown your 0.20.x cluster, install 0.90.x (or 0.89.x), REMOVE conf/hbase-default.xml (its now inside the hbase jar; and old 0.20.x hbase-default.xml actually has config damaging to hbase 0.90.x), review hbase-site.xml so it only references config. that is in 0.90.x -- see 0.90.x hbase-default.xml -- and restart.
You cannot go back once you've moved to 0.90.x/0.89.x.
From 0.19.x to 0.20.x
Please read the below carefully.
You can only migrate to 0.20.x from 0.19.x. If you have an earlier hbase, you will need to install 0.19, migrate your old instance, and then install 0.20.x.
This migration rewrites all data. It will take a while.
Compression settings have changed in 0.20.x. By default, migration will disable compression setting all column families to no compression. You can enable manually post-compression. Beginning with HBase 0.20.2, you can set a "migrate.compression" property in hbase-site.xml to LZO or GZ to have all of your table migrated to that compression setting automatically. See Using Lzo Compression for instructions to add LZO support to your cluster.
Preparing for Migration
You MUST do a few things first before you can begin migration of either hadoop or hbase.
Update to the HEAD of the 0.19 branch
Among its fixes are explicit fixes to help the migration.
Major Compacting all Tables
Before you begin, you MUST run a major compaction on all tables including .META. table. A major compaction compacts all store files in a family together dropping deleted and expired cells. Major compaction is necessary because the way deletes work changed in 0.20 hbase. Migration will not work without your completing major compaction. Use the shell to start up major compactions. For example, the below cluster has only one table named 'a'. See how we run a major_compaction on each:
stack@connelly:~/checkouts/hbase/branches/0.19$ ./bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Version: 0.19.4, r781868, Tue Jul 14 11:27:58 PDT 2009 hbase(main):001:0> list a 2 row(s) in 0.1251 seconds hbase(main):002:0> major_compact 'a' 0 row(s) in 0.0400 seconds hbase(main):003:0> major_compact '.META.' 0 row(s) in 0.0245 seconds hbase(main):004:0> major_compact '-ROOT-' 0 row(s) in 0.0173 seconds
In the above, the compaction took no time. The case will likely be different for you if you have big tables.
The way to confirm that the major compaction completed is to do a listing of the hbase rootdir in hdfs. For each region on the filesystem, each of its stores should have one mapfile only if major compaction succeeded. For example, below we list whats under the 'a' table directory under the hbase rootdir:
/tmp/hbase-stack/hbase/a /tmp/hbase-stack/hbase/a/1833721875 /tmp/hbase-stack/hbase/a/1833721875/a /tmp/hbase-stack/hbase/a/1833721875/a/info /tmp/hbase-stack/hbase/a/1833721875/a/info/8167759949199600085 /tmp/hbase-stack/hbase/a/1833721875/a/info/.8167759949199600085.crc /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085 /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/data /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.data.crc /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/.index.crc /tmp/hbase-stack/hbase/a/1833721875/a/mapfiles/8167759949199600085/index
There is one column family in this table named 'a' (unfortunately, since it muddles the example, the table name is also 'a'). The table has one region whose encoded name is 1833721875. Under this region directory, there are family directories -- in this case there is one for the 'a' family -- and under each family directory, there is the info -- for store file metadata -- and the mapfiles directories. There is only one mapfile in our case above, named 8167759949199600085 (MapFiles are made of data and index files).
You cannot migrate unless all has been major compacted first.
-ROOT- and .META. flush frequently so they can mess up your nice and tidy single-file per store major_compacted hbase layout. They won't flush if there have not been edits so, make sure your cluster is not taking writes and hasn't been doing so for a good while before starting up the major compaction process. Getting your cluster to shutdown with one file only in -ROOT- and .META. may be a bit tough so to help, facility has been added to the HEAD of the 0.19 branch that will allow you major compact catalog regions in a shutdown hbase. This facility only works on the -ROOT- and .META. catalog tables, not on user space tables. For usage, type:
./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion
For example, to major compact the -ROOT-:
$ ./bin/hbase org.apache.hadoop.hbase.regionserver.HRegion hdfs://www.example.com:9000/hbasetrunk2/-ROOT- major_compact
Don't forget the 'major_compact' off the end else it just lists out the content of the region.
I had to copy the hadoop-site.xml to a location where it would be picked up by the above script -- e.g. from my hadoop 0.19 install to my $HBASE_HOME/conf -- so the above script could find the right HDFS otherwise it was going against local filesystem.
In the head of 0.19 is a tool that will tell you if you have successfully major compacted your 0.19 data:
$ ./bin/hbase org.apache.hadoop.hbase.util.FSUtils
It will print out:
majorCompacted=true
If it prints false, do a bin/hbase fs -lsr / on your hbase directory and figure out what hasn't been major compacted. If a catalog table -- -ROOT- or .META. -- then use the above HRegion tool to compact. Otherwise, start up 0.19 again, fire up shell and try major compact again. If you have more regions on FS (reported in Master's UI) than assigned regions, you have to use the tool attached here.
Migrating tableindexed tables (ITHBase)
The index schema location has changed formats from 0.19 to 0.20. To migrate indexed tables you must run a migration on 0.19. With hbase up, run $ ./bin/hbase org.apache.hadoop.hbase.client.tableindexed.migration.MoveIndexMetaData This tool is in the head of the 0.19 branch. You should do this before you do the major compaction step above.
Can you back up your data?
Migration has been tested but if you have sufficient space in hdfs to make a copy of your hbase rootdir, do so. Just in case. Use hdfs distcp.
Migrating
Migrate hadoop. Refer to the Hadoop Upgrade page.
Migrate HBase. The bulk of the time involved migration is the rewriting of the hbase storefiles from their 0.19 format into the new 0.20 format. Each rewrite takes about 6-10 seconds. In the filesystem, count roughly how many regions you have (or get it off the UI). Multiple regions * 10 seconds. If the migration will take longer than you are prepared to wait, there is a mapreduce job to do the file convertions only:
$./bin/hadoop jar hbase-0.20.x.jar hsf2sf
This job takes an empty input and output directory. It will first run through your filesystem to find all mapfiles to convert, write a file to the input directory listing all files, and then startup the mapreduce job to do the convertions.
Here is how to you'd run a job that had /tmp/input and /tmp/output as input and output directories and that specified a split size of about 1k per map:
$./bin/hadoop jar ../trunk/build/hbase-0.20.0-dev.jar hsf2sf -Dmapred.max.split.size=1024 /tmp/input /tmp/output
If you don't set mapred.max.split.size down from default, you'll likely end up with one map only doing all rewrites.
Now, run the hbase migration script. If you have run the mapreduce job, it will notice that all storefiles have been rewritten and will skip the rewrite step. Otherwise, the migration script first does this convertion.
BEFORE you start, set the following hbase-site.xml parameter, hbase.hregion.memstore.block.multiplier, to 100 (Don't forget to set it back down when the migration finishes).
Run the migrate as follows:
$./bin/hbase migrate upgrade
Post-Migration
Make sure you replace all under $HBASE_HOME/conf with files from the new release. For example, be sure to replace your old hbase-default.xml with the version from the new hbase release.
Read the new 'Getting Started' carefully before starting up your cluster. Basic configuration properties have changed. For example hbase.master/hbase.master.hostname is no longer used. They are replaced by hbase.cluster.distributed. See the 'Getting Started' for detail on how to set the new properties. While your cluster will likely come up on the old configuration settings, you should move to the new configuration.
From 0.1.x to 0.2.x or 0.18.x
The following are step-by-step instructions for migrating from HBase 0.1 to 0.2 or 0.18. Migration from 0.1 to 0.2 requires an upgrade from Hadoop 0.16 to 0.17, and migration from 0.1 to 0.18 requires an upgrade from Hadoop 0.16 to 0.18. The Hadoop Upgrade Instructions are slightly out-of-date (as of this writing, September 2008). As such, the below instructions also clarify the necessary steps for upgrading Hadoop.
Assume Hadoop 0.16 and HBase 0.1 are already running with data you wish migrate to HBase 0.2.
- Stop HBase 0.1.
From the Hadoop Upgrade Instructions, perform steps 1-4 and 9-10 (and optionally 5-8, 11-12) on your instance of Hadoop 0.16.
Run {$HADOOP_HOME_0.17}/bin/start-dfs.sh -upgrade
- Perform Hadoop upgrade steps 16-19 on your instance of Hadoop 0.17.
Run {$HADOOP_HOME_0.17}/bin/hadoop dfsadmin -finalizeUpgrade
Download and configure HBase 0.2. Make sure hbase.rootdir is configured to be the same as it was in HBase 0.1.
Run {$HBASE_HOME_0.2}/bin/hbase migrate upgrade
- Start HBase 0.2.
As you will notice, the Hadoop Upgrade Instructions (specifically steps 2-4, 16-18) ask you to generate several logs to compare and ensure that the upgrade ran correctly. I did notice some inconsistency in my logs between dfs-v-old-report-1.log and dfs-v-new-report-1.log; specifically the Total effective bytes and Effective replication multiplier fields did not match (in the new log, the values reported were zero and infinity, respectively). Additionally, dfs-v-new-report-1.log claimed that the update was not finalized. Running {$HADOOP_HOME}/bin/hadoop dfsadmin -finalizeUpgrade resolves the second issue, finalizing the upgrade as expected. I could not find a way to resolve the inconsistencies with the Total effective bytes and Effective replication multiplier fields. Nonetheless, I found no problems with the migration and the data appeared to be completely intact.
The API in 0.2 is not backward-compatible with hbase 0.1 versions. See API Changes for discussion of the main differences.