Thrift API

This page discusses the Thrift client API for Hbase. Thrift is both cross-platform and more lightweight than REST for many operations.

The latest version of the Hbase Thrift API is described by Hbase.thrift.

Using the API

Generating a Thrift client package

Once Thrift is installed, use:

thrift --gen [lang] [hbase-root]/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift

lang should be one of java, cpp, rb, py, perl or another language listed in Hbase.thrift.

This will produce a directory called gen-py, gen-rb, etc. containing the appropriate model.

Starting the Thrift server

The Thrift server can be started with:

[hbase-root]/bin/hbase thrift start

Using with Python

See Yann's tutorial (July 2008).

To acquire a Thrift client instance:

from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase

transport = TBufferedTransport(TSocket(host, port))
transport.open()
protocol = TBinaryProtocol.TBinaryProtocol(transport)

client = Hbase.Client(protocol)

Use help(client) to view the Python API.

Data Type Spec

See Hbase.thrift for an up-to-date API and a complete definition of all methods and data types.

This section contains some definitions of Thrift data types needed for communication.

ColumnDescriptor

Used by getColumnDescriptors.

struct ColumnDescriptor {
  1:Text name,
  2:i32 maxVersions = 3,
  3:string compression = "NONE",
  4:bool inMemory = 0,
  5:string bloomFilterType = "NONE",
  6:i32 bloomFilterVectorSize = 0,
  7:i32 bloomFilterNbHashes = 0,
  8:bool blockCacheEnabled = 0,
  9:i32 timeToLive = -1
}

TRegionInfo

Used by getTableRegions.

struct TRegionInfo {
  1:Text startKey,
  2:Text endKey,
  3:i64 id,
  4:Text name,
  5:byte version,
  6:Text serverName,
  7:i32 port
}

Mutation

Used when performing batch update operations. isDelete is the switch you flip when you want to delete a cell.

struct Mutation {
  1:bool isDelete = 0,
  2:Text column,
  3:Text value,
  4:bool writeToWAL = 1
}

TRowResult

Used by getRow and other gets of rows.

struct TRowResult {
  1:Text row,
  2:map<Text, TCell> columns
}

Method Spec

This section contains some definitions of the methods exposed to clients. If you have a method to propose, add a JIRA along with a comment explaining why we should want such a method.

Meta-info methods

Get Table Names

Returns a list of table names.

list<Text> getTableNames()

Get Column Descriptors

Return a list of column families for a given table.

map<Text,ColumnDescriptor> getColumnDescriptors(Text tableName)

Get Table Regions

Return a list of the regions that make up a table.

list<TRegionInfo> getTableRegions(Text tableName)

Row methods

Get Row

Retrieve a list<TRowResult> for a given row, with all the usual options. (timestamp, selected columns). There are lots of other methods to get rows with different parameters.

list<TRowResult> getRow(Text tableName, Text row, map<Text, Text> attributes)

Mutate Row (Put)

Send a series of mutation commands (put, delete) to the table.

void mutateRow(Text tableName, Text row, list<Mutation> mutations, map<Text, Text> attributes)

Delete Row

Delete an entire row.

void deleteAllRow(Text tableName, Text row, map<Text, Text> attributes)

Scanner methods

Scanner methods use a scanner id that is maintained on the Thrift server. The ScannerId is defined as:

typedef i32    ScannerID

In Thrift, an i32 is a signed, 32-bit integer that identifies the scanner object on the Thrift server.

Open Scanner

Create a scanner for a table with some options.

ScannerID scannerOpen(Text tableName, Text startRow, list<Text> columns, map<Text, Text> attributes)

Get Scanner Results

Retrieve one or more records from the scanner at once.

list<TRowResult> scannerGetList(ScannerID id, i32 numberOfRows)

Close Scanner

Close a scanner.

void scannerClose(ScannerID id)
  • No labels