Glossary


Term

Definition

Major version

First number in the version, to the left of the period. e.g. in version 2.3, the major version is "2"

Minor version

Second number in the version, immediately to the right of the period. e.g. in version 2.3, the minor version is "3"

Compatibility window

Range of consecutive major versions where compatibility between two entities is guaranteed

Motivation and Goals


The current lack of a concrete versioning story for HBase is limiting from both an operational and development perspective. We propose a "first-pass" versioning story (that can be expanded upon later) that addresses the following use cases and concerns:

Operations

Development

Requirements


Design


Wire format

Protobuf vs. Thrift vs. Avro

We propose to use protobuf for wire format. The primary reason is that the current HBase RPC engine (see HADOOP-7379) supports protobuf-encoded data, and protobuf is relatively more stable than the alternatives. In addition, Hadoop RPC uses protobuf, and the community may eventually want Hadoop and HBase to share the same RPC.

We also propose to change the HBase RPC connection header from Writable to protobuf so that the HBase RPC is programming language agnostic.

RPC

Currently, the HBase RPC engine does not support async IO or protocol negotiation. These features don't impact compatibility and therefore can evolve separately and are not in scope for this document.

Interfaces

http://docs.google.com/a/cloudera.com/leaf?id=0BzYqRa05S66NMDcxMjUyYTMtZWE2Yy00ZmIyLThiMjgtMjJkNGU0NGU5OTg1

  1. Client talks to ZK to find out the location of the master and the root region server.
  2. Client applications talk to RS using HRegionInterface to read from/write to/scan a table, etc..

  3. Client applications talk to master using HMasterInterface to dynamically create a table, add a column family, and so on.

  4. Master talks to RS using HRegionInterface to open/close/move/split/flush regions, and so on.

  5. Master puts data in ZK to store the active master and root region server location, create log splitting tasks, track RS's status, and so on.
  6. RS reads data in ZK to track log splitting tasks and update it to grab a task and report status, create a node for the RS so that master can track the status of this RS, track master location and cluster status, and so on.
  7. RS talks to master using HMasterRegionInterface to report RS load, RS fatal errors, RS starts-up.

  8. Occasionally, RS talks to root region or meta region with HRegionInterface to check the status of a region, create new daughter regions in region splitting, and so on.

Phasing


The order of phases is based on priority. They can be done in parallel if there are enough resources.

Phase 0: HBASE-4403: Separate existing APIs into public and private interfaces

In order to define which APIs can be changed, we need to separate existing APIs into public and private.

Phase 1: Compatibility between client applications and HBase clusters

Goal:

Note: deal with 1, 2, 3 (we get 8 "for free") in the interface graph. These tasks can be sub-tasks of HBASE-5305 Improve cross-version compatibility & upgradeability or HBASE-5306 Add support for protocol buffer based RPC. HBASE-5306 can also include a new RPC engine (the latest Hadoop one). This plan focuses on the data encoding/decoding.

Tasks:

Phase 2: HBase cluster rolling upgrade within same major version

Goal:

Note: deal with 4, 5, 6, 7 in the interface graph.

Tasks:

Open questions


Technical

Policy

Appendix


Future work (out of scope of this document)

References


Meeting notes

* HBaseWireCompatibility20120221

Hbase/HBaseWireCompatibility (last edited 2012-02-21 22:50:16 by ToddLipcon)