Thrift API
This page discusses the Thrift client API for Hbase. Thrift is both cross-platform and more lightweight than REST for many operations.
The latest version of the Hbase Thrift API is described by Hbase.thrift.
Using the API
Generating a Thrift client package
Once Thrift is installed, use:
thrift --gen [lang] [hbase-root]/src/java/org/apache/hadoop/hbase/thrift/Hbase.thrift
lang should be one of java, cpp, rb, py, perl or another language listed in Hbase.thrift.
This will produce a directory called gen-py, gen-rb, etc. containing the appropriate model.
Starting the Thrift server
The Thrift server can be started with:
[hbase-root]/bin/hbase thrift start
Using with Python
See Yann's tutorial (July 2008).
To acquire a Thrift client instance:
from thrift.transport.TSocket import TSocket from thrift.transport.TTransport import TBufferedTransport from thrift.protocol import TBinaryProtocol from hbase import Hbase transport = TBufferedTransport(TSocket(host, port)) transport.open() protocol = TBinaryProtocol.TBinaryProtocol(transport) client = Hbase.Client(protocol)
Use help(client) to view the Python API.
Data Type Spec
This design documentation is now outdated. See Hbase.thrift for an up-to-date API.
This section contains the definitions of Thrift data types needed for communication.
columnDescriptor
Used by getColumnDescriptors. How much information should we expose here?
struct columnDescriptor {
1:string name,
2:int32 maxVersions,
3:bool compression
}
regionDescriptor
Used by getTableRegions.
struct regionDescriptor {
1:string startKey,
2:string host
}
mutation
Used when performing batch update operations. isDelete is the switch you flip when you want to delete a cell.
struct mutation {
1:bool isDelete="false",
2:string columnName,
3:string value
}
Method Spec
This section contains the definition of the methods we want to expose to clients. If you have a method to propose, add it to the appropriate subsection below along with a comment explaining why we should want such a method.
Meta-info methods
Get Table Names
Returns a list of table names.
list<string> getTableNames()
Get Column Descriptors
Return a list of column descriptors for a given table.
list<columnDescriptor> getColumnDescriptors(string tableName)
Get Table Regions
Return a list of the region and host tuples that make up a table.
list<regionDescriptor> getTableRegions(string tableName)
Row methods
Get Row
Retrieve a map<col name, value> for a given row, with all the usual options. (timestamp, selected columns)
map<string, string> getRow(string tableName, string row), map<string, string> getRow(string tableName, string row, i64 timestamp), map<string, string> getRow(string tableName, string row, list<string> columns), map<string, string> getRow(string tableName, string row, list<string> columns, i64 timestamp)
Mutate Row (Put)
Send a series of mutation commands (put, delete) to the table.
void mutateRow(string tableName, string row, list<mutation> mutations), void mutateRow(string tableName, string row, list<mutation> mutations, i64 timestamp)
Delete Row
Delete an entire row.
void deleteRow(string tableName, string row), void deleteRow(string tableName, string row, i64 timestamp)
Scanner methods
Open Scanner
Create a scanner for a table with some options.
i32 openScanner(string tableName, string startRow), i32 openScanner(string tableName, string startRow, string stopRow), i32 openScanner(string tableName, string startRow, string stopRow, list<string> columns)
Get Scanner Results
Retrieve one or more records from the scanner at once.
map<string, string> getScannerResult(i32 scannerID)
Close Scanner
Close a scanner.
void closeScanner(i32 scannerID)
Comments on API design
Bryan, I think you may want to use the "binary" type instead of "string" to avoid any possible encoding issues. "binary" is a raw raw byte[] in Java. -- Chad
Bryan, I've been working on various Thrift servers and clients (mainly C++ and Ruby) at Powerset and will be taking a look at creating a Thrift server implementation of this API. Is this something on which you are actively working? If not, I'll take a look at the REST server code as a model for hooking up the Thrift API to Hbase. Other than that, without knowing a bit more about the HBase API, the Thrift API looks good. One thing that we'll need to add to it are Exception declarations. -- DavidSimpson
Using the thrift API presumes that the system is already running the thrift servlet. How does one get that started? -- JimRWilson 2008-04-01 15:56:33
Jim, to start a thrift server, do ${HBASE_HOME}/bin/hbase thrift start or ${HBASE_HOME}/bin/start-daemon.sh start thrift if you want logs captured and a pid file written (FYI, this kind of question belongs over in the hbase mailing list -- no one reads the wiki; smile).