Lucene Jumping Off Point

This page will provide links to various Lucene pages in this wiki. More information about Lucene can be found at their website http://lucene.apache.org

It should be noted that rather than using Lucene in-process the preferred solution nowadays is to use a separate SolR server.

Nutch used to be a Lucene sub project but became a top level project in 2010.

Lucene dynamic attributes

Torsten Krah asks:

> pre 1.0 Days, it was possible to have dynamic attributes in lucene, because
> the API let you do such things (Lucene document access).
>
> How to do the same in 1.0> - using 1.1 the API the NutchDocument does only
> know name and value, but if i don't know the name (dynamic attribute via
> HtmlParser, meta tags indexing) - how can i still index them? Or is this
> impossible with the lucene backend now?

Andrzej Bialecki replies:

It's still possible to do this, but it's undocumented...

Here's a quick howto: in your IndexingFilter, whenever you want to add a previously undeclared field you need to declare its Lucene options on a per-document level like this:

       String fieldName = "myMetaField";
       String value = "undeclared meta value";
       Metadata meta = nutchDocument.getDocumentMeta();
       meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
LuceneConstants.STORE_YES);
       meta.add(LuceneConstants.FIELD_PREFIX + fieldName,
LuceneConstants.INDEX_TOKENIZED);
       //... etc, add those field options that you want
       // and add the field value
       nutchDocument.add(fieldName, value);
  • No labels