This page was created mainly to answer the How do I handle bad HTML content FAQ.
To allow HTML documents to be used as input to pipelines, Cocoon uses JTidy as the basis of its HTMLGenerator, to parse HTML, allowing many less-than-perfect HTML documents to be converted to XML (technically SAX events).
Before release 2.0.4, the HTMLGenerator did not allow all JTidy options to be set.
From 2.0.4, the "jtidy-config" configuration element of the HTMLGenerator points to a properties file that can be used to set all JTidy options, giving better control on the processing of HTML input (thanks Sylvain Wallez!).
HTMLGenerator:\[http://xml.apache.org/cocoon/userdocs/generators/html-generator.html official documentation\]. \\ \\ |