THIS DOCUMENT IS NOW OBSOLETE. MIGRATION HAS BEEN IMPLEMENTED. FOR HOW TO MIGRATE, SEE How to Migrate
A working document to figure how migrations will work in hbase. Initial outline comes of a trawl of the content of HADOOP-2394. Does not consider hadoop migrations.
Assertions
- All hbase data and state is out on the file system: Moving from one version should be just a case of moving or rewriting files on the file system.
- Hbase cannot be running when a migration is run.
- This can be tricky to assert when the hbase versions differ to the extent that they are unable to talk to each other (Caller just hangs and eventually timesout).
- Sometimes, the amount of on-filesystem data that needs to be changed will be large so migration will need to run a MR job.
- hbase FS image needs versioning. On startup, hbase will check the FS version. If awry, hbase will shut itself down emitting a migration needed message. Versions are finer-grained than release number.
- The commit of every incompatible change would be accompanied by a script that can move hbase across the incompatibility.
- A migration runs migration scripts in order, from oldest through to latest (Migration scripts are named in a manner that dictates an order -- or a catalog file lists the order in which scripts are run).
- Downtime must be minimal.
- Migration script will do no damage if run when there is nothing to migrate
Prerequisites/Dependencies
Hbase fast backup to be run before migration to protect against data loss: See ./bin/hadoop distcp
Issues
Should hbase classes be versioned and know how to migrate themselves? Seems like excessive overhead especially for smaller classes H!StoreKey and its like. If not, how to go between versions (How to float two versions of same class in same job?).
Maybe overhead wouldn't be that bad. See VersionedWritable. It uses single byte versioning. H!ColumnDescriptor is already versioned.
Implementation/Decisions
The migration script is named o.a.h.hbase.util.Migrate. Run it by invoking ${HBASE_HOME}/bin/hbase migrate.
Versions are explicit integers. The first version is 1.
The version of a particular hbase.rootdir install is recorded into a file at the top-level named hbase.version.
- Its OK if a migration that spans versions A-D requires that you first install version C, migrate from A-C, then install D and finish the migration from C-D.