HTML Serializer

The secret to generating compliant 4.01 strict from Cocoon is to have the correct doctype declaration in the sitemap and to ensure the previous transformer does not have a default namespace.

If the transformer that feeds the HTMLSerializer has a statement similar to this:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml">

Then you must remove the default namespace declaration like so:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

If you do not do this then some elements such as <br> and <link> tags will have a surplus '/' character appended and the html tag will have an illegal namespace attribute.

Note also that the HTMLSerializer does not make any attempt to clean up illegal tags or attributes that may be fed to it. So for example if you feed it with <img align="left" src="picture.jpg"> then the 'align' attribute will be passed through even though its use is deprecated in HTML 4.01 strict because it is considered a presentational element.

Example of a HTML 4.01 strict serializer

 <map:serializer logger="sitemap.serializer.html"
      mime-type="text/html"
      name="html"
      src="org.apache.cocoon.serialization.HTMLSerializer">
   <doctype-public>-//W3C//DTD HTML 4.01//EN</doctype-public>
   <doctype-system>http://www.w3.org/TR/html4/strict.dtd</doctype-system>
 </map:serializer>

Page output should now be HTML 4.01 with no namespaces or '/' characters in <br> tags for example.

(From a post by David Legg, 2005/11/10)

  • No labels