Intro, only pure python code and library are working on python and jython. All C related python packages aren't compatible. Jython allows python syntax on top of java VM. One great thing is that you can use java classes within python. Jython2.5.1 as been release last September.
1) manual pdf text modification
I thought it was simple to modify a pdf template to change a text but I was wrong. Even if you are able to re-encode new text and change length, you will hit walls. It is more complex then that (xref etc.). Most pdf lib provide encoding helper function but you will get hard time finding decoding one, as an example ascii85. After some time, I decided to try to make reportlab working with jython.
2) reportlab import error with jython
I tried to used reportlab, a powerful lib to create PDF, but it was generating this error when I was importing reportlab.pdfgen: java.lang.ClassFormatError: java.lang.ClassFormatError: Invalid method Code length 66566 in class file reportlab/pdfbase/_fontdata$py. According to this thread on warkmail, there was a simple solution but the patch wasn't working. You can find the working patch that I have created here and proposed to reportlab team.
3) Saving pdf to memory instead of files
In order to do in memory pdf manipulations, I used the pure pyPdf python lib from Mathieu Fenniack. Basically, I tried to save a canvas in memory and couldn't figure it out why it wasn't working. Basically, I was doing outputStream.writelines(c._
4) Simple comparison python/jython
I was wondering how much slower was jython compare to python. As you can see, it is slower and it degrades with some parameter size (ex: n pages). In this example, it also takes 4 to 6 times more memory.
5) Jython out of memory
If you get:
OutOfMemoryError: java.lang.OutOfMemoryError: GC overhead limit exceeded
use -J-Xmx1024m jython option to allow more memory heap size for the java netbeans.
4) Threading optimization
Jython doesn't suffer from the GIL problem. Look at this video to get more information about it "Mindblowing Python GIL". Basically jython can do real multi-threading. In my context, I could easily parallelize part of my code so I tried it by using the Theadpool of Christopher Arndt. Unfortunately, I still haven't been able to make is faster. pyPdf hasn't been designed to be used in a real threading environment (PdfFileReader can't be shared between threads) which introduce limitations.
5) pyPDF profiling
It is amazing to see the tremendous effort people are putting to make python syntax available on each platform (java->jython; .Net->ironpython etc.) It is a sign of python great syntax. Share on Reddit!!!