-- Main.LukeBaker - 23 Jan 2005
Here's a document listing any sort of development tasks or suggestions.
Fetching
HTTP Improvements
HTTP Authentication support
HTTP Cookie support
These two could come from the Jakarta HTTPClient work started by AndyHedges. I've implemented NTLM authentication, but it won't be hard to add basic support as well -- I just have to figure out the best way to store credentials in nutch's XML config files.
I've modified Hedges' code to use a single HTTPClient object with multiple connection objects, so cookies should work fine. I'll check whether last-modified can be checked as well from the client, but wouldn't it need changes to the fetcher as well?
-- Main.KenMeltsner - 04 Feb 2005
HTTP Last-Modified support
Support for Microsoft's annoying CIFS file service protocol (e.g.
file://server/share or perhaps cifs://server/share)
Parsing Improvements
See the ParserFactoryImprovementProposal
Analyzing
Add multi-lingual support (see MultiLingualSupport)
Searching
Add multi-lingual support (see MultiLingualSupport)
Result Serving
Include an search.jsp version that returns XHTML or xml
A templating system
Allow authenticated users to edit/upload templates for search results
search.jsp accepts template ID
Maybe use
Velocity? Cache text taken from non-HTML docs to provide a rough preview of Office, PDF, etc. docs.
Add a toHtml method to the Content object for prettier previews of proprietary format docs