Sunday, October 17, 2010

How to convert HTML to PDF

Recently I was faced with a situation where, two days before the project was supposed to go into system testing phase, client had a change of mind (sounds familiar?  Sometimes I feel this happens quiet often).

In our project, we where showing an invoice as html, which the users could print.  Everything was fine, client was happy with the html invoice till the system testing was about to start.  Then one fine day, client decides they need the invoice as a PDF and not as an HTML.

Well, well, we had to deliver this in about a day.  The invoice was not really straight forward.  It changed based on different situations.

The project was build using Java technologies.

We had three choices:

  • Scrap all the effort (development and testing) that we had put in to get the HTML invoice in the format desired by the client.   Use some ugly api's to generate the PDF that would look similar to the HTML invoice we already had in place - Painful approach which would require a lot of effort.
  • Design a jasper report (jrxml) which would look exactly like the HTML invoice, then use jasper api's to render the designed report as PDF - Relatively sensible approach.  Jasper is extremely powerful (no doubts about that!), but people who have used Jasper in past would agree, designing a report in Jasper is not the easiest of the things.  It would take some time.  Moreover, the look and feel of the generated PDF might not be exactly like the HTML invoice.  In this approach also we might have to scrap all efforts we had put in generating the HTML invoice.
  • Find a way to convert HTML invoice directly into PDF - An awesome approach!  The effort we had put in to get the HTML invoice in the right format would not be wasted and everyone would live happily ever after!
I being a lazy developer, always choose the easiest approach.  Hence, I decided to give the approach #3 a shot.  Tried finding some frameworks that could convert HTML directly into PDF.

Some were good, some were bad and some were ugly.


But when I was trying these frameworks, I had a WOW moment!

Found a framework that would take the HTML and convert that into the exact same PDF!  It used the styles that were included in the HTML document and the generated PDF looked exactly like the HTML invoice!  I was thinking, Man this is Awesome!

The framework is called Flying Saucer

Internally Flying Saucer uses iText to genearte the pdf.  iText is really powerfull peace of software, it can genearte, read, edit, append pdf documents.

Lets look at the code to convert an HTML document directly into a PDF.

How do they do it:


That's it!  Pass the URL of the HTML invoice (we want to convert into PDF) to the create method.  Have a look at the generated PDF invoice file.

I was so amazed to look at the PDF file, it looked exactly like the HTML invoice.

To add to the joy, PDF is a paged medium.  Which means it could have page numbers, footers and headers.

Flying Saucer supports CSS 2.1 standards for paged medium and recognizes @page attribute in the CSS.  Hence, if you want page numbers at the bottom right cornor of your generated PDF, simply include the following styles in your HTML to be converted into PDF.

Simply, isn't it!
Note:  To convert HTML to PDF using Flying Saucer, your HTML should be a valid XML

Update: I have uploaded the sample code to convert html document in PDF here.

13 comments:

  1. Great Post Buddy.

    ReplyDelete
  2. Sir,can u plz mail me the whole code with supportinf files and libraries.....i shall be highly thankful to you for this act.

    mail me on:
    er_manish89@rediffmail.com

    ReplyDelete
  3. me to please I am in hurry
    robert.ristevski@x3mlabs.com

    I tried this but I get some exception:

    Can't load the XML resource (using TRaX transformer)

    ReplyDelete
  4. iText isn't all that faithful when it comes to HTML->PDF conversion. In particular, the CSS is quite hit-or-miss. It's been getting some attention in the last couple releases, but still not that great.

    ReplyDelete
  5. Actually I would beg to differ on that.

    Recently I tried converting a relatively complex html page with images, background colors and lots of styling (CSS) information. Flying saucer with iText performed really well.

    The converted PDF looks exactly like the HTML page!

    ReplyDelete
  6. thank you for sharing this!

    http://www.convertintopdf.com - convert pdf to JPG, PNG, TIFF

    ReplyDelete
  7. I have no words for this great post such a awe-some information i got gathered. Thanks to Author.
    HTML5 Developer

    ReplyDelete
  8. Mr deep, can you give sample code call from jsp to this class.
    TQ

    ReplyDelete
  9. Thanks! It might save me tons of time!

    ReplyDelete
  10. Thanks for sharing this information - it looks like the sort of thing I've been hunting for.

    I will try out the sample project. One thing I am keen to find out is if it will give me a PDF version of the final state of my (very) dynamic page. What I mean by this is that the page features a lot of drag and drop and/or other dynamic JS stuff, and it's that final page state I want to have as an image (PDF, GIF, JPG - whatever.

    Very interesting!

    ReplyDelete
  11. Hmm, you will have to try it yourself but my best guess would be it wont give you the final state of the page as pdf. What it essentially does is parses the HTML and converts that to PDF. I dont think it would take into account the javascript you have on the page.

    Do try it out and let me know.

    ReplyDelete
  12. This is really helpful. Thanks.

    Sir, how can we done this (ex: 1st table on the 1st page, 2nd table on the 2nd page, etc.) And how to use the page number?

    ReplyDelete
  13. Hi,

    I am using this code and jar file in my android application.But i am getting some error to convert html to pdf.Can you help me how to resolve this error.The error i am facing i.e.

    03-13 19:10:13.897: E/AndroidRuntime(13761): java.lang.VerifyError: org.xhtmlrenderer.pdf.ITextOutputDevice
    03-13 19:10:13.897: E/AndroidRuntime(13761): at org.xhtmlrenderer.pdf.ITextRenderer.(ITextRenderer.java:108)
    03-13 19:10:13.897: E/AndroidRuntime(13761): at org.xhtmlrenderer.pdf.ITextRenderer.(ITextRenderer.java:102)

    ReplyDelete

Have some Fun!