Contents
Glossary
Term |
Definition |
Major version |
First number in the version, to the left of the period. e.g. in version 2.3, the major version is "2" |
Minor version |
Second number in the version, immediately to the right of the period. e.g. in version 2.3, the minor version is "3" |
Compatibility window |
Range of consecutive major versions where compatibility between two entities is guaranteed |
Motivation and Goals
The current lack of a concrete versioning story for HBase is limiting from both an operational and development perspective. We propose a "first-pass" versioning story (that can be expanded upon later) that addresses the following use cases and concerns:
Operations
Decouple client applications from HBase: HBase clients are part of a separate application and often administrated separately from the HBase cluster. Today, the application and cluster must be upgraded in lockstep. Clients should interoperate with HBase RS's and masters that are running different major versions. This allows for the following operational improvements:
- Multiple pods: HBase clients may write to multiple HBase clusters / pods (sharded clusters) and the shards may be upgraded separately.
- Application-level replication: HBase installation with active and standby clusters should be able to upgrade, and HBase clients can work with both.
No downtime for minor version upgrades
Development
Simplified support for bugfixes, upgrades, and testing - no need for specialized migration scripts
Higher developer cadence in the community - can add functionality and not worry about breaking version compatibility
Requirements
HBase server-server running different minor versions shall interoperate in an extensible manner.
HBase client-server running different major versions shall interoperate in an extensible manner.
- For example, in a scenario where client is running with version A and server is running with version B: anything the other side does not understand is ignored, provided defaults for, or otherwise handled in an appropriate manner.
- Formats and protocols shall be extensible to allow for new functionality such as RPC tracing.
Developers shall be able to augment RPC protocol with new methods within minor and major version upgrades.
- Critical path operations (Get/Put) performance shall suffer no more than 10% from the current 0.92 version's performance on YCSB load tests (i.e. read/update/scan/insert should individually be no more than 10% slower).
Design
Wire format
Protobuf vs. Thrift vs. Avro
We propose to use protobuf for wire format. The primary reason is that the current HBase RPC engine (see HADOOP-7379) supports protobuf-encoded data, and protobuf is relatively more stable than the alternatives. In addition, Hadoop RPC uses protobuf, and the community may eventually want Hadoop and HBase to share the same RPC.
We also propose to change the HBase RPC connection header from Writable to protobuf so that the HBase RPC is programming language agnostic.
RPC
Currently, the HBase RPC engine does not support async IO or protocol negotiation. These features don't impact compatibility and therefore can evolve separately and are not in scope for this document.
Interfaces
- Client talks to ZK to find out the location of the master and the root region server.
Client applications talk to RS using HRegionInterface to read from/write to/scan a table, etc..
Client applications talk to master using HMasterInterface to dynamically create a table, add a column family, and so on.
Master talks to RS using HRegionInterface to open/close/move/split/flush regions, and so on.
- Master puts data in ZK to store the active master and root region server location, create log splitting tasks, track RS's status, and so on.
- RS reads data in ZK to track log splitting tasks and update it to grab a task and report status, create a node for the RS so that master can track the status of this RS, track master location and cluster status, and so on.
RS talks to master using HMasterRegionInterface to report RS load, RS fatal errors, RS starts-up.
Occasionally, RS talks to root region or meta region with HRegionInterface to check the status of a region, create new daughter regions in region splitting, and so on.
Phasing
The order of phases is based on priority. They can be done in parallel if there are enough resources.
Phase 0: HBASE-4403: Separate existing APIs into public and private interfaces
In order to define which APIs can be changed, we need to separate existing APIs into public and private.
Phase 1: Compatibility between client applications and HBase clusters
Goal:
- To make HBase client applications work properly with HBase clusters of different major and minor versions.
Note: deal with 1, 2, 3 (we get 8 "for free") in the interface graph. These tasks can be sub-tasks of HBASE-5305 Improve cross-version compatibility & upgradeability or HBASE-5306 Add support for protocol buffer based RPC. HBASE-5306 can also include a new RPC engine (the latest Hadoop one). This plan focuses on the data encoding/decoding.
Tasks:
- Replace RPC negotiation with extensible PB-based types
- Replace root and master address znodes in ZK with PB-enabled types (goal: client's ZK interactions become extensible) (1 in the graph)
Replace existing HRegionInterface calls for read from/write to/scan a table... with PB-enabled types (goal: client->RS and RS->RS RPC becomes extensible) (2 in the graph)
Replace existing HMasterInterface calls with PB-enabled types (goal: client->master RPC becomes extensible) (3 in the graph)
- Replace data stored in .META. and -ROOT- tables with PB-enabled types (goal: client can read from old and/or new .META. and -ROOT- tables) (2 in the graph)
Phase 2: HBase cluster rolling upgrade within same major version
Goal:
- To make an HBase cluster able to roll upgrade within the same major version
Note: deal with 4, 5, 6, 7 in the interface graph.
Tasks:
Replace existing HRegionInterface calls for open/close/move/split/flush regions... with PB-enabled types (goal: master->RS RPC becomes extensible) (4 in the graph)
- Replace Writables used in ZK for communication between RS and master with PB-enabled types (goal: RS and master ZK interactions become extensible) (5, 6 in the graph)
Replace existing HMasterRegionInterface calls with PB-enabled types (goal: RS->master RPC becomes extensible) (7 in the graph)
- Add version information to each server's ZK data (master and RS's) (goal: tracking live version numbers, used for automatic wire-off of new features in persistent data formats until all servers have hit new version) (5, 6 in the graph)
- Add version information to RS's on master status UI
Open questions
Technical
- - How does ZK security and HBase RPC security play into this? Should be orthogonal?
- - Should pluggable encodings (thrift/avro/pb/writable) be in scope?
- - Should async IO servers and clients be in scope or not?
Policy
- - What is the policy for existing versions (89, 90, 92, 94) -- do we support them or require on major upgrade before they get this story?
- - Developers should be able to remove deprecated methods or arguments to maintain flexibility, but can't do that within the compatibility window. What should be our compatibility window? 2 years (roughly 4 major versions)?
- - What is the ZK version interoperability story?
- - What is the HDFS version interoperability story?
- - Should architectural-level changes require a major version bump?
Appendix
Future work (out of scope of this document)
- Possible to extend RPC with meta-data that can enable new functionality like RPC tracing
- Unify this with Hadoop RPC
- Online rolling upgrade of single cluster between major versions: Today, major version upgrades of a single cluster require downtime to upgrade all services in lockstep, while some minor versions updates can be upgraded via the rolling-restart script. HBase should remain available through this process.
- Partial rollout: HBase clusters should allow for some nodes to "try" a newer version for testing purposes. Today, this is a manual process and possible only within minor versions. (likely possible, would like to not exclude this possibility).
- Cluster configuration changes: HBase should remain available as configuration changes (hbase-site.xml) or hotfixes are applied. Today, rolling-restart script can be used to perform this operation.
- Replication across different versions
- Disaster recovery: Operators should be able to smoke test a new version during the rolling upgrade before turning on the new features for general use. If anything is wrong during the rolling upgrade, it should be able to roll back.
- ZK wire compatibility: is necessary for RPCs between different versions of HBase and ZK. Currently ZK supports backward compatibility for one version only. Different versions of HBase could support different ZK versions.
- HDFS wire compatibility
- Data format changes may prevent minor or major version roll-back.
- Security RPC data compression/encryption changes may prevent minor or major version roll-back
- Persistent Data is stored in version specific formats in HDFS (xml configs, regioninfo, tableinfo). Some of these data encodings and formats are directly exposed; for example, ZK is not exposed as an API.
References
Cross version upgrade and compatibility: https://issues.apache.org/jira/browse/HBASE-5305
Add protbuf based RPC to HBase: https://issues.apache.org/jira/browse/HBASE-5306
Redo IPC/RPC: https://issues.apache.org/jira/browse/HBASE-2182
HDFS wire compatibility: HADOOP-7347
HDFS client wire compatibility: HDFS-2060
HDFS data protocol wire compatibility: HDFS-2058
Use protobuf objects in existing IPC: HADOOP-7379