How to convert HTML or XHTML to PDF

Apache FOP is an XSL-FO processor. If you want to convert HTML to PDF you need to convert it to XSL-FO first, before FOP can do anything for you. There are several possible approaches:

  1. If the original data is available as XML it is probably the best approach to start with XML and create a separate XSLT that converts the XML to XSL-FO.
  2. Convert the HTML to XHTML (e.g. by using jtidy) and convert the XHTML to XSL-FO using XSLT (e.g. with the Xalan included in the FOP distribution). Of course you will need a XSLT-stylesheet to be able to transform XHTML to XSL-FO. There is a Stylesheet for XHTML to XSL-FO transformation available from Antenna House which is probably not completely compatible with FOP.

    • /!\ FOP currently doesn't support automatic table-layout. Column widths have to be specified.

  3. Convert the HTML to XSL-FO directly using a specialized tool called html2fo (check out the Tools section below). This easy approach will offer no or very limited control of the PDF output design.

  4. A better, long-term solution for extensive documentation is to convert the html to DocBook xml. Then use conventional docbook => html and docbook => pdf, etc. Start here for some ideas using the html2docbook stylesheet: html2docbook.

(!) Add additional content (additional ideas, pitfalls, etc.)!

Tools

HowTo/HtmlToPdf (last edited 2010-06-22 16:31:02 by Tom Browder)