Tuesday, February 24, 2009

Leaky assumption and Gradient Descent

Theoretical models are based on strong assumptions as software layers (i.e.: leaky abstractions). Layers are created to simplify complex systems and allows work specialization as define in Marx capital approach...and to accomodate limited human brain capacity.
Experienced practitioners know that their value reside in weak assumptions dept comprehension because market will hire new grade students otherwise.

Standard backpropagation gradient descent algorithm assumes that inputs are independent so we can optimize them independently of each other. This assumption or according to me a leaky abstraction, allows you to optimize all parameters at the same time which simplify the life of software engineers and researchers because parameters are theoritically uncorrelated. In mathematical works, we are assuming that the hessien matrix has values only on its diagonal.

My master thesis done under Yoshua Bengio supervision was mostly focusing on understanding huge neural networks training inefficiency. At that time, our goal was to train neural network language models. According to my undertanding and the experimental proof that I have documented, the problem is basically an optimization problem. The uncorrelated assumption simplification doesn't stand when parameters numbers explode.
Unfortunatly, I have failed to find a solution to this problem but the new trend in reaction to Hinton break throught in 2006 will and is already reviving research in this topic.

In my literatude review, I found that several researchers identified some of the reasons who can explain this inefficiency. According to me, they are direct and indirect consequences of the optimization problem introduced by the leaky abstraction of uncorrelated inputs. Those reasons are the moving target problem and the attenuation and dilution of the error signal as it is propagates backward through the layers of the network. We present in my master thesis other reasons who can explain this behavior, the opposite gradients problems, the non-existence of a specialization mechanism and the symmetry problem.

I will treat those concepts in a futur post. This inspiration of this post has been possible because of a brainless Hollywood movie that has allow me to free valuable brain cycles. there is always a good side of the story.

Newton law doesn't stand in Einstein theory as uncorrelated inputs in huge neural networks. Always remember your leaky assumptions/abstractions.

Monday, February 9, 2009

cygwin or mingw to compile C++ swig projects on windows?

The answer is definitely mingw. 4 simple steps:
  1. download swig
  2. download mingw
  3. add python, swig, mingw to environment variable PATH
  4. do: python setup.py build_ext --inplace --compiler=mingw32
I have been able to compile it easily with cygwin but I had to download shit load of stuff and I haven't figure out how to use python windows packages and cygwin python package all togetter which was a show stopper for me.

Why was I looking to compile my C++ project on windows?

I want to use my C++ machine learning lib on a real-time video and the only package that was working to grab images in python was videocapture but it is only supported on windows. Now I have to compile my machine learning on windows...

If you want to compile swig C++ project, you need a compiler. If you don't have VisualC++, you are stuck with this error:
error: Python was built with Visual Studio 2003;
extensions must be built with a compiler than can generate compatible binaries.
Visual Studio 2003 was not found on this system. If you have Cygwin installed, you can try compiling with MingW32, by passing" -c mingw32" to setup.py.

At this point, you can try to download free visual C++ compiler or try mingw32 with sigwin. I tried a free visual studio 2003 package but I couldn't make it working. Then, I try cygwin with gcc and it was compiling but I couldn't use the compiled package outside cygwin. so I tried mingw32. The -c option doesn't work and "python setup.py build --compiler=mingw32" doesn't allow you to use your package inside python (i.e.: can't import _flayers in my context). Finally I tried python setup.py build_ext --inplace --compiler=mingw32 and it was working.

After a chat with Simon and Tristan that are doing video stuff on linux at the SAT, I discovered that they were grabbing video on Linux. They referred me to opencv which works perfectly. In my experimentation with opencv, I realized that Pygame is a million time faster to display video then matplotlib.

Swig, Mingw, OpenCV make Python so convenient!

Friday, February 6, 2009

Startups and faster Learning

Every one that is familiar to gradient descent and Machine Learning knowns that learning is proportional to the error. Bigger are the errors, bigger are the possibility to learn something. According to me, startups, by nature, are the ideal environments to put yourself in hight gradient learning situations. It force you to try much more things, take more risk and innovate to survive if you don't want to have no choice to look for another job and/or get penny stocks options. Another interesting thing is that true nature of people appear and mask fall rapidly. Friendships are tested to its limits and it allows you to filter out short term life partner.

Don't forget that high gradient put high constraints to your body and frustrations has to be released somehow. In my case, cycling has been used as stress balancing vehicle, you have to find yours. Resigning is too often chosen as the latest resort. You have to keep in mind that everyone is replaceable, even in startups. If you really want to be rich, you should do the Californian technique: cumulate stocks of multiple startups. don't forget that most of them (~90-95%) fail badly, so increase your chances by switching before reaching burnout and build your contacts network by the same way. After the graduation is definitely the best time to try this fast growing experience when you can take it. On my side, I tried it 5-6 years, you have to know your limits. Successful startups have workaholic CEO so it will increase the pressure and the gradient. If he isn't one, find another one, it should be the 19 mistakes of startups (18 mistakes of startups).

If you have the chance to access higher decision position, you learn that there are irrational error cost function and that you might be optimizing the wrong error criteria. For the ones who aren't following me, I recommend that you see the French movie 99 Francs and pay further attention the the high executive meeting.

I apply the same principle with my son this winter, he doesn't want to put his gloves, he learn/change his mind faster at -10 degrees Celsius.

Sunday, February 1, 2009

MLboost 0.3 has been released

I am please to announce the release of MLboost 0.3. MLboost 0.3 will be use for the next Montreal-Python presentation ("Machine Learning enpowered by Python"). It will be announced in the coming weeks. The most important new feature is the integration of pyflayers. pyflayers is a simplified python swig interface to flayers, my C++ neural networks library. I am preparing a live real-time demo of a machine learning application, it should be interesting for the audience and it is good motivation to improve my package. I have played with beamer, a latex package to create presentation slides and I have been particularly been pleased, I do recommend it. On the other hand, it is time to prepare the next winter camping trip and to continue eating great food.