How to create PDF files from XML using Apache-FOP

XML

Nowadays, XML is the native format of many applications, Oracle APEX for example, partly bases its reporting systen on XML. Middleware as Oracle BPEL, SAP PI, IBM Websphere and of course many Web services are also XML based. XML is also being used as exchange format in business to business transactions, for example for e-Invoicing. In these and many other environments there is often the need to create PDF files from XML.

XSL-FO

In the past PDF files would be created using proprietary reporting tools that would use a database as data source. Additionally these tools are programming language specific. Later on the XSL-FO standard allowed for a better way to do it in many cases.

What is new about XSL-FO?
- it is programming language neutral.
- it is a standard so you are not tied to a vendor.
- it uses XML as data source.

Apache FOP

One implementation of the XSL-FO standard is Apache FOP. One important aspect about this implementation is that the Apache license model allows to use it in commercial environment.

Many free/open source products have a license model that puts strong restrictions on the way you may use the software. Not surprisingly there are software companies that do not allow the use in their code any open source software other than Apache Foundation one. So if you use Apache you know you are on the safe side.

Procedure

Once you have installed Apache-FO, let's see how we can create a very simple PDF file. Initially we will use the file hello.fo for this. The output will contain the lines “Hello John” and “Hello Mary”. It is not the the aim of this article to present XSL-FO language in detail, but you can see the high level structure is made of the document layout and the page content.

File hello.fo:

<?xml version="1.0" encoding="UTF-8"?> 
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> 

<fo:layout-master-set>
<fo:simple-page-master margin-right="2cm" margin-left="2cm" margin-bottom="1cm" margin-top="0.5cm" font-family="sans-serif" page-width="21cm" page-height="29.7cm" master-name="main">

<fo:region-body margin-bottom="1cm" margin-top="4.0cm"/>
<fo:region-before extent="1.5cm"/>
</fo:simple-page-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="main">
<fo:flow flow-name="xsl-region-body">
<fo:block>Hello John</fo:block>
<fo:block>Hello Mary</fo:block>
</fo:flow>
</fo:page-sequence>

</fo:root>

Using Apache FOP command line utility you can convert this file to PDF. For instance:

fop hello.fo hello.pdf

You could automatically create the file hello.fo by different means, however the recommended way to create this file is by putting the data (persons’ names) in a XML file like for example this one:

File persons.xml:

<?xml version="1.0"?>

<persons>

<person>
<name>John</name>
</person>

<person>
<name>Mary</name>
</person>

</persons>

...and then creating a XSLT (XSL transformation) which takes the XML data as input and composes the hello.fo file.

File hello.xsl:

<?xml version="1.0" encoding="iso-8859-1"?>

<xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Page layout information -->

<xsl:template match="/">
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set>
<fo:simple-page-master master-name="main" page-height="29.7cm" page-width="21cm" font-family="sans-serif" margin-top="0.5cm" margin-bottom="1cm" margin-left="2cm" margin-right="2cm">
<fo:region-body margin-top="4.0cm" margin-bottom="1cm" />
<fo:region-before extent="1.5cm" />
</fo:simple-page-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="main">
<fo:flow flow-name="xsl-region-body" >
<xsl:apply-templates select="persons/person" />
</fo:flow>
</fo:page-sequence>

</fo:root>

</xsl:template> 

<xsl:template match="persons/person">

<fo:block>
Hello <xsl:value-of select="name" />
</fo:block>

</xsl:template>

</xsl:stylesheet>

With these two files, the Apache FOP command to create the PDF file would be:

fop -xml persons.xml -xsl hello.xsl -pdf hello.pdf

This command is performing 2 steps in one call:

  • Processing the XSLT to create the FO file
  • Creating the PDF from the FO file

As we have seen, if your data native format is XML, you can create user friendly PDF files with no programming. This is vendor independent and it is being used today in many situations and products.

With this procedure you are however involved with XSLT and XSL-FO languages and files, which you may prefer to avoid learning and mastering. If that is your case you may be interested in my related post The easy way to convert XML into PDF.