How to create PDF files from XML using Apache-FOP

XML

Nowadays, XML is the native format of many applications, Oracle APEX for example, partly bases its reporting systen on XML. Middleware as Oracle BPEL, SAP PI, IBM Websphere and of course many Web services are also XML based. XML is also being used as exchange format in business to business transactions, for example for e-Invoicing. In these and many other environments there is often the need to create PDF files from XML.

XSL-FO

In the past PDF files would be created using proprietary reporting tools that would use a database as data source. Additionally these tools are programming language specific. Later on the XSL-FO standard allowed for a better way to do it in many cases.

What is new about XSL-FO?
- it is programming language neutral.
- it is a standard so you are not tied to a vendor.
- it uses XML as data source.

Apache FOP

One implementation of the XSL-FO standard is Apache FOP. One important aspect about this implementation is that the Apache license model allows to use it in commercial environment.

Many free/open source products have a license model that puts strong restrictions on the way you may use the software. Not surprisingly there are software companies that do not allow the use in their code any open source software other than Apache Foundation one. So if you use Apache you know you are on the safe side.

Procedure

Once you have installed Apache-FO, let's see how we can create a very simple PDF file. Initially we will use the file hello.fo for this. The output will contain the lines “Hello John” and “Hello Mary”. It is not the the aim of this article to present XSL-FO language in detail, there is a good tutorial here,  but you can see the hight level structure is made of the document layout (red lines) and the page content (blue lines). 

File hello.fo:

<?xml version="1.0" encoding="UTF-8"?> 
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> 

<fo:layout-master-set>
<fo:simple-page-master margin-right="2cm" margin-left="2cm" margin-bottom="1cm" margin-top="0.5cm" font-family="sans-serif" page-width="21cm" page-height="29.7cm" master-name="main">

<fo:region-body margin-bottom="1cm" margin-top="4.0cm"/>
<fo:region-before extent="1.5cm"/>

</fo:simple-page-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="main">
<fo:flow flow-name="xsl-region-body">
<fo:block>Hello John</fo:block>
<fo:block>Hello Mary</fo:block>

</fo:flow>

</fo:page-sequence>

</fo:root>

Using Apache FOP command line utility you can convert this file to PDF. For instance:

fop hello.fo hello.pdf

You could automatically create the file hello.fo by different means, however the recommended way to create this file is by putting the data (persons’ names) in a XML file like for example this one:

File persons.xml:

<?xml version="1.0"?>

<persons>

<person>
<name>John</name>
</person>

<person>
<name>Mary</name>
</person>

</persons>

...and then creating a XSLT (XSL transformation) which takes the XML data as input and composes the hello.fo file.

File hello.xsl:

<?xml version="1.0" encoding="iso-8859-1"?>

<xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- Page layout information -->

<xsl:template match="/">
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

<fo:layout-master-set>
<fo:simple-page-master master-name="main" page-height="29.7cm" page-width="21cm" font-family="sans-serif" margin-top="0.5cm" margin-bottom="1cm" margin-left="2cm" margin-right="2cm">
<fo:region-body margin-top="4.0cm" margin-bottom="1cm" />
<fo:region-before extent="1.5cm" />
</fo:simple-page-master>

</fo:layout-master-set>

<fo:page-sequence master-reference="main">
<fo:flow flow-name="xsl-region-body" >
<xsl:apply-templates select="persons/person" />
</fo:flow>
</fo:page-sequence>

</fo:root>

</xsl:template> 

<xsl:template match="persons/person">

<fo:block>
Hello <xsl:value-of select="name" />

</fo:block>

</xsl:template>

</xsl:stylesheet>

With these two files, the Apache FOP command to create the PDF file would be:

fop -xml persons.xml -xsl hello.xsl -pdf hello.pdf

This command is performing 2 steps in one call:

  • Processing the XSLT to create the FO file
  • Creating the PDF from the FO file

As we have seen, if your data native format is XML, you can create user friendly PDF files with no programming. This is vendor independent and it is being used today in many situations and products.

With this procedure you are however involved with XSLT and XSL-FO languages and files, which you may prefer to avoid learning and mastering. If that is your case you may be interested in my related post The easy way to convert XML into PDF.

11 responses
"Not surprisingly there are software companies that do not allow the use in their code any open source software other than Apache Foundation one" Well, copyleft and copyright are mostly incompatible: A copyrighted may not be used freely in any other project. Copy left may be used in any project, but of course you may not apropiate their source, that is what propietary software may not include licenses like GPL. Copyright is like: "this code is mine and only use it as I want", so you may not include source of another developer and apply this rule, its obvious. If I provide freely my work, it's not to see how a big company earn lot of money with it. The restriction is not strong: let's free what is free, that is the only restriction. The problem is not the copyleft, but form the copyright side.
Hello there, i would like to get the different files (XML,XSD,XSLT) and the program you used to generate the pdf file. Thank you in advance Sharveen Chelliah Bsc(Hons) Computer Science Level 1 University of Mauritius
Sharveen, did you read the post? The content of the files is included, just copy and paste it. As for the conversion program, it's Apache-FOP, there's a link to it. Just follow the instructions step-by-step.
Thank you for the post. Helped a lot.
Thanks for the post, helped my out. Is there a way to trigger the pdf file to open immediately once its created?
Hi Chris, I guess the simpler way is to put all relevant fop commands plus the open pdf command in a shell script or batch file. Of course you will have to find out first which is the pdf viewer command on your system.
This is exactly I was looking for. Thank you So much !! Can you please show us how to install FOP and configure? Regards.
If you are interested in translating your xml translation file formats, here's an online software localization tool you may find useful. https://poeditor.com/ has a simple interface, integration with Bitbucket and GitHub, translation memory, API and other features included to make your localization process easier.
Simple and complete explanation, congratulations!
2 visitors upvoted this post.