HTMLTidy is a means to take badly formed HTML markup and generate well-formed XHTML.

There's a command-line utility, as well as a Java API.

This tool is vital if you want to 'screen scrape' data from HTML pages. Cocoon provides HTML Tidy as a Generator.

See also

HTMLTidy (last edited 2009-09-20 23:40:27 by localhost)