Differences between revisions 2 and 3
Revision 2 as of 2004-04-13 21:38:50
Size: 458
Comment: Added attachment
Revision 3 as of 2009-09-20 21:47:46
Size: 464
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
 SpideringDemo.java is an example of how to use httpunit [http://httpunit.sourceforge.net/] to index web pages. There is no guarante on how well this works. It's document parsing is limited to html pages and is inspired by the demo (see lucene-demos.jar).  SpideringDemo.java is an example of how to use httpunit [[http://httpunit.sourceforge.net/]] to index web pages. There is no guarante on how well this works. It's document parsing is limited to html pages and is inspired by the demo (see lucene-demos.jar).
Line 13: Line 13:
 attachment:SpiderDemo.java.ksh  [[attachment:SpiderDemo.java.ksh]]

Description

  • SpideringDemo.java is an example of how to use httpunit http://httpunit.sourceforge.net/ to index web pages. There is no guarante on how well this works. It's document parsing is limited to html pages and is inspired by the demo (see lucene-demos.jar).

Required Libraries

  • httpunit-1.5.X.jar
  • lucene-1.X.jar
  • lucene-demos-1.X.jar
  • Tidy.jar (should be with httpunit)

Source

HttpUnitExample (last edited 2009-09-20 21:47:46 by localhost)