This document describes how to get Nutch to use HBase as a backend for GORA and is based on the revision 993857 of the Nutch trunk
Install and configure HBase 0.20.6. You can check it out from here (N.B. It is important that you grab HBase version 0.20.6 at this is supported by Gora)
- Specify the GORA backend in nutch-site.xml
<property> <name>storage.data.store.class</name> <value>org.apache.gora.hbase.store.HBaseStore</value> <description>Default class for storing data</description> </property>
Note: Currently HBaseStore is NOT YET THREAD-SAFE, so all processes should have single threaded settings (i.e. set number of fetchers to 1). Work to make it thread-safe is in progress.
Compile Nutch -> ant runtime
Make sure HBase is started and working properly as per the quick start tutorial here
You should then be able to use it. Try going to $NUTCH_HOME/runtime/local/bin and do :
nutch inject /someseedDir nutch readdb
You should find more details in the logs on $NUTCH_HOME/runtime/local/logs/hadoop.log