libhdfs
libhdfs is a
JNI based C api for Hadoop's DFS. It provides a simple subset of C apis to manipulate DFS files and the filesystem. libhdfs is available for
download as a part of Hadoop itself. The source for libhdfs is available for browsing
here.
Table Of Contents
Overview
Here is an overview of Hadoop's DFS. The javadocs for DFS are available
here and
here. libdhfs is a simple JNI based C api for accessing and manipulating Hadoop's DFS from native code. It offers a simple subset of the same functionality.
Setup
It is necessary to setup Hadoop's DFS itself first. The information to setup Hadoop is available
here. Once you have a working setup, you will need to get into the src/c++/libhdfs directory and use the Makefile to build libhdfs (in case of issues use this). Once you have successfully built libhdfs you can link it into your programs and are good to go.
APIs
This section describes the various apis provided by libhdfs to manipulate the DFS. It is classified into apis which manipulate individual files and those which manipulate the filesystem itself. (Please see the doxygen documentation [# here] for details of individual apis.)
libhdfs provides apis for both generic manipulation of the filesytem (create directories, copy/move files etc.) and also some very DFS specific functionality (get information on file replication etc.).
At startup one should use the hdfsConnect api to connect to the DFS before any operations can be performed (on files or the filesystem); the analogous hdfsDisconnect performs a clean teardown of the connection.
Generic operations:
hdfsCopy (across filesytems also)
hdfsMove (across filesytems also)
hdfsRename
hdfsDelete
libhdfs also provides apis for manipulating directories on the DFS:
hdfsCreateDirectory
hdfsSetWorkingDirectory
hdfsGetWorkingDirectory
hdfsListDirectory / hdfsGetPathInfo / hdfsFreeFileInfo
The apis to query the filesytems for various properties:
hdfsGetHosts
hdfsGetDefaultBlockSize
hdfsGetUsed / hdfsGetCapacity
libhdfs provides posix-like apis to manipulate individual files (create, read/write, query etc.) listed below:
hdfsOpenFile / hdfsCloseFile
hdfsRead / hdfsWrite
hdfsTell / hdfsSeek
hdfsFlush
hdfsAvailable
Using libhdfs in Threaded Applications
libhdfs can be used in threaded applications using the Posix Threads. However to carefully interact with JNI's global/local references the user has to explicitly call the hdfsConvertToGlobalRef / hdfsDeleteGlobalRef apis.
Examples
The
test cases for libhdfs provide some good examples on how to use libhdfs.
Contact Information
Please drop us an email at
hadoop-users@lucene.apache.org if you have any questions or any suggestions. Use
Jira (component: dfs) to report bugs.
Conclusion
Thank you for your interest in Hadoop and libhdfs!