How to convert HTML or XHTML to PDF
Apache FOP is an XSL-FO processor. If you want to convert HTML to PDF you need to convert it to XSL-FO first, before FOP can do anything for you. There are several possible approaches:
- If the original data is available as XML it is probably the best approach to start with XML and create a separate XSLT that converts the XML to XSL-FO.
Convert the HTML to XHTML (e.g. by using jtidy) and convert the XHTML to XSL-FO using XSLT (e.g. with the Xalan included in the FOP distribution). Of course you will need a XSLT-stylesheet to be able to transform XHTML to XSL-FO. There is a Stylesheet for XHTML to XSL-FO transformation available from Antenna House which is probably not completely compatible with FOP.
FOP currently doesn't support automatic table-layout. Column widths have to be specified.
Convert the HTML to XSL-FO directly using a specialized tool called html2fo (check out the Tools section below). This easy approach will offer no or very limited control of the PDF output design.
A better, long-term solution for extensive documentation is to convert the html to DocBook xml. Then use conventional docbook => html and docbook => pdf, etc. Start here for some ideas using the html2docbook stylesheet: html2docbook.
Add additional content (additional ideas, pitfalls, etc.)!