-- Main.LukeBaker - 23 Jan 2005
Here's a document listing any sort of development tasks or suggestions.
NutchAdministrationUserInterface
Fetching
HTTP Improvements
- HTTP Authentication support
- HTTP Cookie support
These two could come from the Jakarta HTTPClient work started by AndyHedges. I've implemented NTLM authentication, but it won't be hard to add basic support as well -- I just have to figure out the best way to store credentials in nutch's XML config files.
I've modified Hedges' code to use a single HTTPClient object with multiple connection objects, so cookies should work fine. I'll check whether last-modified can be checked as well from the client, but wouldn't it need changes to the fetcher as well?
-- Main.KenMeltsner - 04 Feb 2005
- HTTP Last-Modified support
Support for Microsoft's annoying CIFS file service protocol (e.g. file://server/share or perhaps cifs://server/share)
Parsing Improvements
See the ParserFactoryImprovementProposal
Analyzing
Add multi-lingual support (see MultiLingualSupport)
Searching
Add multi-lingual support (see MultiLingualSupport)
Result Serving
- Include an search.jsp version that returns XHTML or xml
- A templating system
- Allow authenticated users to edit/upload templates for search results
- search.jsp accepts template ID
Maybe use Velocity?
- Cache text taken from non-HTML docs to provide a rough preview of Office, PDF, etc. docs.
- Add a toHtml method to the Content object for prettier previews of proprietary format docs