CreateNewFilter

Howto add a category metadata to your index and be able to search for it. For this, you need to write an indexing filter and a query filter.

Indexing your custom metadata

For the indexing filter, copy the index-more plugin, and change names, dirs, and build files appropriately. The main thing to change is the filter method:

     public Document filter(Document doc, Parse parse, FetcherOutput fo)

In it, you can add your own fields. To add a new category with value "puppies", it will look something like this:

     doc.add(new Field("category", "puppies", false, true, false));

See the Document.add API for more info on the booleans.

That's pretty much it for indexing.

Searching your metadata

To search for this, you need to create a query filter. Copy the query-site plugin. Again change file names, directories, and build files as needed. The main java file is very simple, just change the string in the line with "super". Instead of:

   super("site");

You would have

  super("category");

Make sure that you put your new index-category and query-category plugins in your nutch-default.xml file. Don't forget to check that it's in your WEB-INF/classess directory too.

Credits: HowieWang Thread: http://www.nabble.com/index-search-filtering-by-category-tf2136864.html

  • No labels