Some thoughts of a Machine Learning Practitioner on Software Development, Management, Team Building, Startups, Python, Agile Development, Data visualization... that will distract you from your end goals by making you less efficient but are critical to manage in order to succeed. Don't forget that long time adaptation to inefficient approaches can become your enemy. Let's try to empower others by sharing knowledge & personal experiences.
Wednesday, October 13, 2010
simple multivariate classifier example using python & numpy
Sunday, October 10, 2010
Dimensionality reduction; a simple PCA example using python
Dimensionality reduction is a powerful approach to reduce inputs size, reduce training time and visualize data.
import mdppca = mdp.pca(ds.data)pylab.title("PCA")pylab.plot(pca[:,0], pca[:,1], '.')
Saturday, October 9, 2010
PDF watermarking service using pdfrw on google appengine
Tuesday, September 21, 2010
Summary of machine learning libs available in python
- pybrain: PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive "Backronym". see features. key feature : ecurrent networks (RNN), including Long Short-Term Memory (LSTM) architectures
- mlpy:Machine Learning PYthon (mlpy) is a high-performance Python library for predictive modeling. mlpy makes extensive use of NumPy to provide fast N-dimensional array manipulation and easy integration of C code. The GNU Scientific Library ( GSL) is also required. It provides high level procedures that support, with few lines of code, the design of rich Data Analysis Protocols (DAPs) for preprocessing, clustering, predictive classification, regression and feature selection. Methods are available for feature weighting and ranking, data resampling, error evaluation and experiment landscaping. Key feature: feature selection
- scikit.learn: scikits.learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib). Key distinct features: lasso, nearest neighbor, isomap, various metrics, mean shift, cross validation, LDA, HMMs
- opencv (machine learning): Normal Bayes Classifier, K Nearest Neighbors, SVM, Decision Trees, Boosting, Random Trees, Expectation-Maximization, Neural Networks
- Shogun: A Large Scale Machine Learning Toolbox Comprehensive machine learning toolbox with bindings to various programming languages. PyMVPA can optionally use implementations of Support Vector Machines from Shogun. Large scale kernel learning (mostly svms). this wraps other libraries such as libsvm (well-established) and others that get state of the art performance or are good for extremely large datasets, etc.
- PyMVPA (Multivariate Pattern Analysis in Python): PyMVPA is a Python module intended to ease pattern classification analyses of large datasets. In the neuroimaging contexts such analysis techniques are also known asdecoding or MVPA analysis.
- pylearn (build on top of theano), under V2 construction. New version of plean (c++).
- Theano: (deep learning) Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
- jml: Jeremy's Machine Learning library (C++), include a python interface: Basic classifiers (perceptrons, decision trees, etc) plus ensemble methods (boosting, bagging). Very highly optimized to work with thousands of features and millions of examples. GPGPU support under development. Code derived from this library is extensively used in a commercial computational linguistics application, so it has gone through its paces.
- 3dsvm: AFNI plugin to apply support vector machine classifiers to fMRI data.
- Elefant: Efficient Learning, Large-scale Inference, and Optimization Toolkit. Multi-purpose open source library for machine learning.
- MDP Python data processing framework. MDP provides various algorithms. PyMVPA makes use of MDP’s PCA and ICA implementations. interesting features: ica, LLE
- MVPA Toolbox: Matlab-based toolbox to facilitate multi-voxel pattern analysis of fMRI neuroimaging data.
- NiPy: Project with growing functionality to analyze brain imaging data. NiPy is heavily connected to SciPy and lots of functionality developed within NiPy becomes part of SciPy.
- OpenMEEG: Software package for low-frequency bio-electromagnetism including the EEG/MEG forward and inverse problems. OpenMEEG includes Python bindings.
- Orange: Powerful general-purpose data mining software. Orange also has Python bindings.
- PyMGH/PyFSIO: Python IO library to for FreeSurfer’s .mgh data format.
- PyML: PyML is an interactive object oriented framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods.
- PyNIfTI: Read and write NIfTI images from within Python. PyMVPA uses PyNIfTI to access MRI datasets.
- milk: k-means, svm's with arbitrary python types for kernel arguments. Pythonic interface to libSVM. Stepwise Discriminant Analysis for feature selection. K-means clustering. odels can be pickled and unpickled.
- mlboost: Machine Learning Boost Library (python; includes flayers wrapper); minimal version of sourceforge mlboost project. Specialized on features extraction and visualization.
- http://torch5.sourceforge.net/
Torch5
is the official successor ofTorch3
, and is now developed atNEC Laboratories America
andGoogle Labs
. No python wrapper yet. - pylearn v2; new version of pylearn from LISA lab of udm.
Wednesday, August 25, 2010
How to add activities to redmine timesheet plugin?
If you want to add timesheet activities, you can't do it from the UI. You need to add it directly into mysql db like that:
insert into enumerations values (10,'Experimental Development', 3, 0, "TimeEntryActivity", 1, NULL, NULL)Don't forget to set property the id and the position.
insert into enumerations values (11,'Scientific Research', 4, 0, "TimeEntryActivity", 1, NULL, NULL)
Wednesday, July 21, 2010
patching class function in python
Today we had to patch a class function in production. Monkey patching can become tricky if reference are kept at several place like pointer in C and C++.
Here is a simple example on how to make sure all references will use the new definition.class Foo:This is another reason why interpreter language like python are so powerful.
def f(self):
print "default f"
def newf(self):
print "newf"
Foo.f.im_func.func_code = newf.func_code
Tuesday, June 1, 2010
Before filing Software Patents, wait for Bilski case resolution
The 2008 ruling of the Court of Appeals for the Federal Circuit (CAFC) was broad enough to reject Bilski's patent and a certain category of software patents.
The Supreme Court agreed to review the CAFC's ruling (as Bilski v. Kappos), and the judges raised the issue of software during the hearing.
The Supreme Court's ruling could greatly change the patentability of software patents, business method patents, and the middle ground of e-commerce patents.
Tuesday, May 18, 2010
Easy Presentation Slides with Latex-Beamer
Here is a example in 6 simple steps:
- mkdir trial; cd trial
- sudo aptitude install latex-beamer
- wget http://mlboost.svn.sourceforge.net/viewvc/mlboost/doc/ml-python-mtl-april2009.tgz
- tar -zxvf ml-python-mtl-april2009.tgz
- pdflatex ml-python-mtl-april2009.tex
- xpdf ml-python-mtl-april2009.pdf
When you start using latex, you never want to go back. Content and visualization should be well isolated/separated.
Saturday, May 15, 2010
Future of Quebec Software Engineer
- For a long time, Canadian dollar was low which makes Quebec and Canadian Software engineer pretty cheap, unfortunately it is changing and it is already not true anymore. They are starting to be expensive. Many software jobs in Québec are related to US company having an office in Canada and many others rely on US clients.
- Easy tax credit for experimental development and scientific research will be harder to get since Canadian governmental is back to deficit and Quebec is facing aging population and reduced number of tax payers. For more info, check this post.
- Outsourcing, a cheaper and more accessible alternative. As a example, when more Indians will be able to speak comprehensible English, they will become a real danger. Many companies start using cheap remote resource or use outsourcing service like rentacoder to lower down their cost.
- Due to globalization and near US market, many companies will prefer perfectly fluent English average engineer as oppose to great engineer with lack of English skills.
- Due to globalization and near US market, many companies will promote perfectly fluent English average engineer as oppose to great engineer with lack of English skills.
What should future Francophone engineer need to success and be more competitive in this changing landscape?
- They need to be fully bilingual (btw, many company don't consider Francophone university because they know that most of their software engineers aren't bilingual)
- They need management skills. Ability to manage technical people is rare and critical to the success of projects, company are more aware of it and are desperately looking for it. Increase success rate of project is critical.
- They need leader and decision maker, not only a work force.
- They need writing and communication skills. Technical people who neglect this will pay the price in the long run. Without those skills, it is hard to evolve in the upper level of a company.
- They need conflict, negotiation skills. Engineers forget that their job is on average 20% technical and 80% HR related. Resolving human related problems is important part of software engineer work.
Outsourcing is out the door and software engineer have to better understand the idea of comparative advantages because their cheaper advantage is getting away.
Too expensive non perfectly bilingual engineers with lack of management and communication skills jobs will face increase outsourcing pressure.
Friday, May 14, 2010
Disruptive business model: Lew Cirne, serial entrepreneur on the future of Enterprise Software etc.
- Disruptive business model: Lew Cirne, serial entrepreneur on the future of Enterprise Software
- What you are paying for at $8,000 per CPU
- Disrupting the Enterprise Software Business Model: A Conversation with Alan Armstrong
- The $400-An-Hour Band-Aid
- My Name is Lew, and I am a Recovering Enterprise Software Developer
- Old Dogs, New Tricks and SaaS
Software transition: recipe for a disaster
- Don't plan transition
- Don't communicate transition plan if you have one
- Promote non-skilled people to decision positions
- Kill initiative/ignore message (ex: kill messenger and think the problem is gone)
- Avoid long term planning (ex: use argument like business moving too fast)
- Don't put a technical team lead
- Don't do follow-up of team requests
- Don't involve HR to smooth transition
- Don't hire pro-actively
- Don't prepare a B plan
- Don't involve and/or update your team on the transition plan
- Don't give overall project responsibility to anyone
- Stay in a reactive mode
- Don't communicate info in daily meeting
- Don't ensure you have a good pulse of your team
- Give too much power to people who don't understand software process
- Like Greenspan, dream the magic hand will fix everything magically (people might not compensate for bad decisions eternally)
- Expect people to stay
- Don't make difference between maintenance & development cost
- Let non technical people take technical decisions
- Ignore problems
- Don't recognize people work in crisis
- Allow managers to not be able to evaluate technical people
- Consider Indian outsourcing can solve everything
- Think people are easily replaceable
- Don't talk about carrier evolution to your crew
Monday, May 10, 2010
Tax credits (RS&DE): the new reality = good news for startups
Saturday, May 8, 2010
VC founding & StartupCamp 6
- Start with problem not solutions
- Look for high reaction signal (good and bad)
- Stop adding features
- Focus on customer reactions, real-time as possible
- Volume->Cost->Conversion | acquisition/activation/retention/referral/revenu...
- etc.
But what's the point of getting VC founding for software startups? They don't need much more then computers & time. With VC founding you could get 6-12 months where most of it will be re-reimbursed with R&D tax credits. Basically, they give you cash advance for an important share of your company for a ridicule risk. VC mains arguments are:
- It will allow you to be the first in the market, its bullshit, everyone knows it is the timing that is the most important.
- 10% of 15 millions is better then 100% of 1 millions...but you still will do most of the work with more stress from investors and you might get 10% of 2 millions. Most entrepreneurs start their company to take control and not getting back to a slavery mode.
Andy Nulman Keynote presentation was interesting in the perspective of importance of a partner and the need to adapt but the mercantile conclusion was a pathetic anglo-saxon point of view: you could make more cash by doing the chicken dance then doing something interesting.
Phil Telio announcement about the new startup dedicate house notman was great for Montréal. I will definitely apply many of the stuff learned there. As an example, if you are a founder and will move to the CEO position, you better start delegating what you are better at because it will allow you to improve others skills and will make other supervision very efficient.
Tuesday, March 16, 2010
Ubuntu remix & Asus Eee PC 1005PE 10''/250G 14hr review
- Wireless (see link to fix flaky + interruption)
- Bluethooth
- Sound
- flash (ex on youtube)
- hibernate
- second monitor short key
- lower intensity screen short key
- ubuntu one (storage)
- video
Buy US made by Google.com: search protectionism?
- Everywhere except US: google.com = search around the world
- US only: google.com = search in the US
In many fields, it is great that it is ranking closest business higher (ex: closest pizza place) but not filtering non US competitors when service doesn't need to be close by (ex: consulting services).
Knowing that the United states is the biggest promoter of the free market/open market etc., I am totally amaze to learn such a rule. Do as I Say, Not as I Do. As an example, if you don't have an address is the states and/or host our web site there, SEO (seach engine optimization) is useless to get new customers from the states (a insignificant market;), they won't find you if there are using the biggest search engine...google.
Do Oubama administration really need a buy America plan?
The first role of free trading in unrestricted access to information (i.e: économie de l'information: Joseph E. Stiglitz).
You can try it:
- What everyone outside the US see with google.com : http://www.google.com/ig?hl=
all - What US people see with google.com: http://www.google.com/ig?gl=us
Monday, March 15, 2010
Opportunities ahead to reduce impact of doctor shortage an money constraint: Machine Learning
As an example, radiologist are earning on average around 700 K/year and Ophthalmologist 600K/year. Yes our generalist doctor could earn more, the average is around 150K/year. As a heavy tax payer, I am suggesting using optician approach in BC, replace heavy paid optometrist by machines to do the exams. Basically, automate what need to automate and use doctor efficiently which could reduce/eliminate the shortage.
When people go to far with their salary expectation, it is time to bring them back on earth. No one accept fees increase for poorer service.
Public system and doctors studies are founded by our taxes and Health System spending represents close to 50% of Québec spending, money doesn't grow in trees. Yes, they are getting less if they were in the states, but US doesn't have a public systems, only wealthy people have access to it and it can't happen in Canada because it is publicly founded.
I have tried to help happy clinic by contacting doctors to offer smart waiting time system to make people wait less but most of them didn't care much: we are busy, people have to way, its a natural filter. They can make us waiting hours even with appointments, treat us like shit because the service offer is low. When most of us are waiting, we aren't earning money to pay them. Shame on you. Everyone is loosing at this game.
Doctor are getting greedy and are starting to see them as untouchable and are forgetting who are paying their salary.
If well packaged, Machine Learning can be use by anyone to do high level screening and provide valuable information. In 2010, doctors remain one of the only profession who doesn't use much machine to make them more efficient and are fighting to stay the bottleneck. With this crisis ahead, it might be a good opportunity for the machine learning community to get into this shielded area in the benefit of everyone, them too.
Friday, March 12, 2010
Why you Shouldn't Accept VC money earlier than you thought: Story about Venture Capital and an Outdated Decision Maker revised
It might be advantageous to accept VC money earlier than you thought.
I have auto-censure one of my post about VC but it need to be republish to ensure entrepreneurs understand the possible deep consequences of such a decision. The post Anatomy of a failed software project initiate my thought of un-censure it because we don't talk enough about bad experiences to not offend people or been seen as too negative but its reality and we should have realistic expectation. With Amazon EC2, google app engine, rackspace hosting and so on, it is becoming way less costly to bring new technologies to the market. Most software company don't need big founding anymore and have just more reason to avoid VCs.
I strongly recommend boostrapping your company with consulting and/or R&D credits, INRS programs etc.
So here is the revised version on the post I have pushed Apr 29, 2009 12:32 AM and removed.
Once my boss, which was one of the founders of the company that I was working for, told be: Avoid as much as you can VC, it should stay your last resort. I haven't realized the deep importance of this after some American sharks VCs took control of our company and enforce their ultra capitalism short term vision. GM is a good example of short term vision but, that's another interesting story.
Venture Capital represents high risk investments and their only gold is to get the highest return on investment in the shortess time period. That's fair, you simply have to know the rules.
Entrepreneurs build and invest, VC rape everything that is possible. At that time, they replaced the top management with their Californian super stars or ... remaining turnips (j'exagère un peu). One of their last super star, when I was there which I will refer to "Le Chasseur", came to take the highest position of the technical side of the business. At that time, I was working on new technology and this old cowboy grandpa came to tell me that we weren't doing real research. According to him, real research was what he was doing 25 years ago with his wired and transistors. I simply told him that our group of 3 (i.e: including manager) weren't pretending doing research but simply doing applied research and definitely not pure research as his thousand of coworkers were doing in the old monopolistic US company he was referring to. This guy had made lot of money in the good years of the Internet and was annoying me with his stories, I should have told him to go and say it to his little sons and daughters and simple retire and let us work. His only salary could have let us build a much better team to continue innovating in technology which was our core business. Without any support from him, even working against us, we still have manage to release a complete new technology (i.e.: prototype + knowledge transfer to prod+ prod support) that is bringing quite lot of income to the company. Charlie, most important point is to bring technology to the market, nothing else matter...I have succeed even if he desperately try to make us fail to justify moving advance stuff to California.
Those clowns (i.e: I know, I am over generalizing) were just sucking all financial resources of our company. Been still a share older, I was relieve to hear that "Le Chasseur" has finally been fired in a reorganization. If you need reorganization to do cleanup in a private company, your level of politic game should be quite high... and politics leads to corruption and mediocrity in such an organization. We could have become a great company but it might remains an average one, I don't think I deserve that after such investments but money leads. According to old colleagues, the latest reorganization was efficient.
I think that the concentration of smart people is higher in CA then in Montreal, but we definitely fall on some bad apples.
Key decisions can't stand in incompetent hands, soon or later the guillotine will come. If you made money and reach your level of incompetency, please let the others take the lead. If you haven't make money, just reorient your carrier. We don't care about your ego, the fact that your are the initial founder etc. what is important is the company success and everyone will benefit from it.
You might have better story and or points of view about VC but what I saw wasn't great:
- They stop/restrain investment in innovation
- They drop some incompetents clowns (over generalized to make the point)
- They suck all money to pays their clowns
- They made your shares less valuable
- They drawn massive amount of money on building products that had no real value and on non core technology
- They create a real wall between decision makers and the workforces
- They invest a lot in fake partnerships
- They love yes mans
- They feel way smarter (especially CA vs Mtl)
I hope that I will never in my live work again for a company that will be taken over by VC. If you need VC, you might already been badly organized to need desesperatly money from them. I would definitely like to hear good stories about VC where they achieve their gold...at least.
Yes I had a painful indirect experience with VC and an outdated decision maker but I knew that startups environment aren't easy. I learned better the rules and the game by playing it. I wish they will find a way to make money because success bring success and Montreal deserve more of them, there is so much talent here. I have emphasis on one bad decision maker but the CEO and the CFO were interesting character...but you need one bad apple drop by your VC to fuck your company and they will get most of the benefits.
As mentioned in Samual Bouchard blog, we need way more private founding alternatives but were aren't yet there.
Sunday, January 31, 2010
python try catch cost vs hasattr (overhead = only 2X)
You might have the possibility to use in some case the hasattr to avoid throwing an exception but your code will be less elegant and/or readable.
In order to make a rational decision, let's compare the time cost. I was expecting a major cost for throwing exception but it ends up been only 2 times more costly then doing an hasattr.
You can see the simple code I have used to compare it here. From now on, I will be less scare about memory performance when I am using try catch.
Friday, January 22, 2010
The power of python within Tomcat for powerful webapps (jython2.5.1)
Jython is coming to the rescue. Since September 26th 2009, Jython2.5.1 has been released and can be use to create servlet that runs inside apache tomcat application server. Jython allows you to write python that is running on java VM (100%) and let you use lot of python pure libraries and let you use all java packages with a python synthax.
If you try it, you might have issues like:
- How can I create a simple jython2.5.1 servlet? (not deprecated jython2.2.1 that doesn't allow you to use most pure python libs)
- Why do I get ImportError when I use standard python package? How can I fix it?
- Where should I put jython2.5.1.jar?
- Where should I put my python code?
- Where can I get a basic example that is working?
- Where should I put my pure python libraries?
Here are the steps you should follow:
- get tomcat http://tomcat.apache.org/ (I used 5.5.28)
- get jython2.5.1 http://sourceforge.net/projects/jython/files/jython/jython_installer-2.5.1.jar
- cp ~/jython2.5.1/Lib tomcat5.5.28/share/lib (required to used std python libs)
- cp ~/jython2.5.1/jython.jar tomcat5.5.28/share/
- download example: wget http://mlboost.svn.sourceforge.net/viewvc/mlboost/jython/HelloWorld/HelloWorld.war
- cp HelloWorld.war tomcat/webapps
- download java jdk
- export JAVA_HOME=sun-jdk-1.6.0_02
- tomcat5.5.28/bin/startup.sh
- try it: http://localhost:8080/HelloWorld/HelloWorld.py
PS: a war file is a zipped file, you can unzip it in tomcat/webapps for testing so you don't need to rezip it and restart the server. When you are done, simply do a jar cvf HelloWorld.war * in the tomcat/webapps folder and ship that single file to the client tomcat server (make sure jython is installed). If you want to add pure python libraries, you can simply add them into your war file, it will work.
Here is the time comparison of the same service:
- python: wsgi httpserver
- jython: wsgi httpserver
- tomcat: java servlet jython2.5.1
Friday, January 8, 2010
matplotlib & python for powerful data visualization
What is the gain and lost effect of percentage of seats in a point of view of proportional representation? Percentage of seats is usually chosen in legislative assemblies. It is the process used in Canadian and Québec elections.
Powerful visualization allow you to see easily the effect. Python & matplotlib is an amazing combination to do so. It took me 20 minutes to allow me to visualize the effect in federal and Quebec election of 2008.
Upper graph (seats vs votes) shows the lost of proportional vote % if you use a seats approach. As an example, liberals gain ~11% and ADQ lost of ~11%.
Lower graph (lost seats vs votes). The real impact of party is the ratio of this lost on their real vote proportion. In this example, it is a gain of ~25% for each Liberals votes (11/(66/125)) and a lost of 66% for the ADQ and ~88% for QS.
Basically:
- In Canadian election, PC & BQ gain power but BQ way more in proportion and Greens lost everything
- In Quebec election: QS & ADQ lost lot of power and PQ and LIB gain it: it might explain why they aren't talking of changing election formula
- Matplot lib and python is an amazing combination to automate data visualization
svn co https://mlboost.svn.sourceforge.net/svnroot/ mlboost/elections
python elections/seats_vs_prop.py
Gerrymandering Explained (youtube;
Gerrymandering - another reason why rep democracy is fundamentally corrupt http://bit.ly/qO4mpH)
Sunday, January 3, 2010
gmail, a powerfull target marketing tool
I though at one point that their primary end goal was to launch a corporate email portal so company won't need to hire high paid sys admin to provide mail server support and by the same time help world wide employee getting something way better then outlook/exchange server that pollute our live. They are doing it already but I think their real goal was to do target marketing but not traditional one.
What better can you get then user emails to understand his profile and do target marketing. They get the highest quality info from your emails, yes your emails.
According to my experience in more traditional target marketing for Bell and at Microcell-Lab, when people do traditional target marketing, they have few info about users and derive new information from which they try to generate better predictions. As an example, they use your postal code to estimate your family revenue etc, and use that information to generate better prediction that you will buy X or Y.
Traditional target marketing practitioners use a lift approach to get the top N most probable buyers for a given product or service and will try to approach those people with promotion or email etc. With gmail, it it way more simple, you use user profile info like email words (btw, they are parsing your emails, take a close look, you will see), and use a prediction engine to advertise the info you are most likely going to like or buy and show it to you directly because you are using their mail service.
Gmail is an amazing target marketing tool because it get profile info directly at the user fingerprint, can do way better prediction then traditional target marketing technics and has access to the customer directly and scale well to get more users. Our prediction is always as good as your data. What's the point of improving algorithm if you can get better and high quality data or as google do both. Larry Page and Sergey Brin are just visionary target marketers!
Friday, January 1, 2010
Jython, pyPdf, reportlab experimentation & patches proposals
Intro, only pure python code and library are working on python and jython. All C related python packages aren't compatible. Jython allows python syntax on top of java VM. One great thing is that you can use java classes within python. Jython2.5.1 as been release last September.
1) manual pdf text modification
I thought it was simple to modify a pdf template to change a text but I was wrong. Even if you are able to re-encode new text and change length, you will hit walls. It is more complex then that (xref etc.). Most pdf lib provide encoding helper function but you will get hard time finding decoding one, as an example ascii85. After some time, I decided to try to make reportlab working with jython.
2) reportlab import error with jython
I tried to used reportlab, a powerful lib to create PDF, but it was generating this error when I was importing reportlab.pdfgen: java.lang.ClassFormatError: java.lang.ClassFormatError: Invalid method Code length 66566 in class file reportlab/pdfbase/_fontdata$py. According to this thread on warkmail, there was a simple solution but the patch wasn't working. You can find the working patch that I have created here and proposed to reportlab team.
3) Saving pdf to memory instead of files
In order to do in memory pdf manipulations, I used the pure pyPdf python lib from Mathieu Fenniack. Basically, I tried to save a canvas in memory and couldn't figure it out why it wasn't working. Basically, I was doing outputStream.writelines(c._
4) Simple comparison python/jython
I was wondering how much slower was jython compare to python. As you can see, it is slower and it degrades with some parameter size (ex: n pages). In this example, it also takes 4 to 6 times more memory.
5) Jython out of memory
If you get:
OutOfMemoryError: java.lang.OutOfMemoryError: GC overhead limit exceeded
use -J-Xmx1024m jython option to allow more memory heap size for the java netbeans.
4) Threading optimization
Jython doesn't suffer from the GIL problem. Look at this video to get more information about it "Mindblowing Python GIL". Basically jython can do real multi-threading. In my context, I could easily parallelize part of my code so I tried it by using the Theadpool of Christopher Arndt. Unfortunately, I still haven't been able to make is faster. pyPdf hasn't been designed to be used in a real threading environment (PdfFileReader can't be shared between threads) which introduce limitations.
5) pyPDF profiling
It is amazing to see the tremendous effort people are putting to make python syntax available on each platform (java->jython; .Net->ironpython etc.) It is a sign of python great syntax.