Lucene Jumping Off Point

This page will provide links to various Lucene pages in this wiki. More information about Lucene can be found at their website

It should be noted that rather than using Lucene in-process the preferred solution nowadays is to use a separate SolR server.

Nutch used to be a Lucene sub project but became a top level project in 2010.

Lucene dynamic attributes

Torsten Krah asks:

> pre 1.0 Days, it was possible to have dynamic attributes in lucene, because
> the API let you do such things (Lucene document access).
> How to do the same in 1.0> - using 1.1 the API the NutchDocument does only
> know name and value, but if i don't know the name (dynamic attribute via
> HtmlParser, meta tags indexing) - how can i still index them? Or is this
> impossible with the lucene backend now?

Andrzej Bialecki replies:

It's still possible to do this, but it's undocumented...

Here's a quick howto: in your IndexingFilter, whenever you want to add a previously undeclared field you need to declare its Lucene options on a per-document level like this:

       String fieldName = "myMetaField";
       String value = "undeclared meta value";
       Metadata meta = nutchDocument.getDocumentMeta();
       meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
       meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
       //... etc, add those field options that you want
       // and add the field value
       nutchDocument.add(fieldName, value);

