Missing from the current Nutch documentation (Tutorial, FAQ) is a list of features. This wiki page could help, if someone who knows the answers can edit it.
(Please reformat this text and divide into feature lists, questions and questions & answers).
Features
Questions and Answers
What kind of searches does Nutch support? (quoted, nested, truncation, wildcarding [and where], Boolean),
"...." (phrase search?), + (what is this for?), - (negation) and fieldname:term. No "AND" or "OR". The and-logic is implied.
Is stemming an option?
According to the
Lucene in Action book: "Nutch does not use stemming or term aliasing of any kind. Search engines have not historically done much stemming, but it is a question that comes up regularly." -- page 329
What kind of stemming does Nutch use? (and can you add exceptions/changes?)
See previous answer
Does Nutch support Boolean operators? (can you use Google-like plus or minus or are you stuck with 1990s terms?)
No
How does the search engine handle punctuation and special characters? (and what's configurable?)
They are treated like a space.
Which document formats are supported?
Guessing from the names of the available parser plugins, this is probably it. However, only the plain text and HTML are enabled by default. Edit conf/nutch-site.xml and change the value of plugin.includes property to include the plugins for the document types that you want Nutch to handle:
Plain Text (plugin: parse-text)
HTML (parse-html)
JavaScript (for extracting links only?) (parse-js)
Microsoft Power Point, the .ppt file (parse-mspowerpoint)
Microsoft Word, the .doc file (parse-msword)
Adobe PDF (parse-pdf)
RSS (parse-rss)
RTF (parse-rtf)
MP3 (?) Is there any text in MP3? (parse-mp3) (JR: Sure, the mp3 itself contains the ID3v1 or ID3v2 tags which contain song information like
title, artist, album, comments, etc. The useful information needed to search mp3s)
ZIP (?) This seems to expand the zip of plain text files and return the concatenated text. (parse-zip)
Questions without Answers
Does Nutch support weighted field searching, synonym support?
What kinds of indexes does Nutch build? (multi-format indexing, incremental indexing, spell-check support, thesauri support, fielded searching, rank-by-reputation?)
What post-coordination options are available? (hey Karen, what does this mean?)
How easy is Nutch to configure?
How transparent is its configuration to a working organization: does it require geeky command line stuff, or can a knowledgable manager enter a web or software interface to view or modify settings?
How are results sorted?
Does Nutch support deduping?
Can one tinker with relevance algoritms?
Are there ranking overrides?