This Sitemap excerpt shows how to configure the encoding on documents generated by the HTML and XML Serializers:

  <!-- these definitions go into the map:serializers element -->

  <!-- configure the XML serializer to use iso-8859-1 encoding -->
  <map:serializer
    name="xml"
    mime-type="text/xml; charset=iso-8859-1"
    src="org.apache.cocoon.serialization.XMLSerializer"
    pool-max="32"
    pool-min="16"
    pool-grow="4"
  >
     <encoding>iso-8859-1</encoding>
  </map:serializer>

  <!-- configure the HTML serializer to use iso-8859-1 encoding -->
  <map:serializer
    name="html"
    mime-type="text/html"
    src="org.apache.cocoon.serialization.HTMLSerializer"
  >
     <encoding>iso-8859-1</encoding>
  </map:serializer>
  <!-- configure the XML serializer to supply "text/html" -->
  <map:serializer name="html" 
                  mime-type="text/html; charset=utf-8"
                  logger="sitemap.serializer.html" 
                  pool-grow="2" pool-max="64" pool-min="2"                
                  src="org.apache.cocoon.serialization.XMLSerializer">
    <doctype-public>-//W3C//DTD XHTML 1.0 Strict//EN</doctype-public>
    <doctype-system>http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd</doctype-system>
    <!-- No XML declaration to force M$-InternetExplorer into standards compliant mode -->
    <omit-xml-declaration>yes</omit-xml-declaration>
    <omit-namespaces>yes</omit-namespaces>
    <encoding>UTF-8</encoding>
    <indent>yes</indent>
  </map:serializer>

Well, I'm sure that example works just fine, since ISO-8859-1 seems to be the default anyway! But my attempts to persuade Cocoon (via jetty) to label its output as UTF-8 hakve all been in vain. Just to clarify, Cocoon is correctly generating UTF-8 encoded characters, but something is slapping

 Content-Type: text/html; charset=ISO-8859-1 

in the HTTP headers. Needless to say, this combination induces browser indigestion. Any clues? – TimGoodwin

  • No labels