Nutch's plugin system is based on the one used in Eclipse 2.x. Plugins are central to how Nutch works. All of the parsing, indexing and searching that Nutch does is actually accomplished by various plugins.

In writing a plugin, you're actually providing one or more extensions of the existing extension-points . The core Nutch extension-points are themselves defined in a plugin, the NutchExtensionPoints plugin (they are listed in the NutchExtensionPoints plugin.xml file). Each extension-point defines an interface that must be implemented by the extension. The core extension points are:

Updated to Nutch apidocs version 1.18

Source Files

You'll find the following inside of a plugin source directory:

Getting Nutch to Use a Plugin

In order to get Nutch to use a given plugin, you need to edit your conf/nutch-site.xml file and add the name of the plugin to the list of plugin.includes. Additionally we are required to add the various build configurations to build.xml in the plugin directory.

Using a Plugin From The Command Line

Nutch ships with a number of plugins that include a main() method, and sample code to illustrate their use. These plugins can be used from the command line - a good way to start exploring the internal workings of each plugin.

To do so, you need to use the bin/nutch script from the $NUTCH_HOME directory,



$ bin/nutch plugin Usage: [PluginRepository] pluginId className \[arg1 arg2 ...\]



As an example, if you wanted to execute the parse-html plugin,

$ bin/nutch plugin parse-html org.apache.nutch.parse.html.HtmlParser filename.html

The PluginRepository is the name of the plugin itself, and the pluginId is the fully qualified name of the plugin class.

<<< See also: WritingPluginExample

<<< See also: HowToContribute

<<< PluginCentral