Sunday, October 17, 2010

How to convert HTML to PDF

Recently I was faced with a situation where, two days before the project was supposed to go into system testing phase, client had a change of mind (sounds familiar?  Sometimes I feel this happens quiet often).

In our project, we where showing an invoice as html, which the users could print.  Everything was fine, client was happy with the html invoice till the system testing was about to start.  Then one fine day, client decides they need the invoice as a PDF and not as an HTML.

Well, well, we had to deliver this in about a day.  The invoice was not really straight forward.  It changed based on different situations.

The project was build using Java technologies.

We had three choices:

  • Scrap all the effort (development and testing) that we had put in to get the HTML invoice in the format desired by the client.   Use some ugly api's to generate the PDF that would look similar to the HTML invoice we already had in place - Painful approach which would require a lot of effort.
  • Design a jasper report (jrxml) which would look exactly like the HTML invoice, then use jasper api's to render the designed report as PDF - Relatively sensible approach.  Jasper is extremely powerful (no doubts about that!), but people who have used Jasper in past would agree, designing a report in Jasper is not the easiest of the things.  It would take some time.  Moreover, the look and feel of the generated PDF might not be exactly like the HTML invoice.  In this approach also we might have to scrap all efforts we had put in generating the HTML invoice.
  • Find a way to convert HTML invoice directly into PDF - An awesome approach!  The effort we had put in to get the HTML invoice in the right format would not be wasted and everyone would live happily ever after!
I being a lazy developer, always choose the easiest approach.  Hence, I decided to give the approach #3 a shot.  Tried finding some frameworks that could convert HTML directly into PDF.

Some were good, some were bad and some were ugly.

But when I was trying these frameworks, I had a WOW moment!

Found a framework that would take the HTML and convert that into the exact same PDF!  It used the styles that were included in the HTML document and the generated PDF looked exactly like the HTML invoice!  I was thinking, Man this is Awesome!

The framework is called Flying Saucer

Internally Flying Saucer uses iText to genearte the pdf.  iText is really powerfull peace of software, it can genearte, read, edit, append pdf documents.

Lets look at the code to convert an HTML document directly into a PDF.

How do they do it:

That's it!  Pass the URL of the HTML invoice (we want to convert into PDF) to the create method.  Have a look at the generated PDF invoice file.

I was so amazed to look at the PDF file, it looked exactly like the HTML invoice.

To add to the joy, PDF is a paged medium.  Which means it could have page numbers, footers and headers.

Flying Saucer supports CSS 2.1 standards for paged medium and recognizes @page attribute in the CSS.  Hence, if you want page numbers at the bottom right cornor of your generated PDF, simply include the following styles in your HTML to be converted into PDF.

Simply, isn't it!
Note:  To convert HTML to PDF using Flying Saucer, your HTML should be a valid XML

Update: I have uploaded the sample code to convert html document in PDF here.
Have some Fun!