Some thoughts of a Machine Learning Practitioner on Software Development, Management, Team Building, Startups, Python, Agile Development, Data visualization... that will distract you from your end goals by making you less efficient but are critical to manage in order to succeed. Don't forget that long time adaptation to inefficient approaches can become your enemy. Let's try to empower others by sharing knowledge & personal experiences.
Sunday, March 29, 2009
Highlights Pycon Chicago 2009
The simple python syntax lead everyone to converge to it: pypy, jpython,cpython and IronPython. Alex Martelli talk on abstraction as a leverage was great: abstraction is inevible; try to understand at least 2 lower layers and create hook instead of hacks! (I am already applying it for the demo of python-montreal).
The frameworks for the web development is getting quite amazing (Django, Whirldwind, pylons etc.)
The talk " A Whirldwind Excursion through C Extensions" of Ned Batchelder was great to get a quick start on own to create C extension; yes python is slow and optimization is sometime inevitable.
The panel on Object Relational Mapper and the talk on Drop ACID and think about data were quite interesting. The keynote presentation by reddit founders is a good example that python provides amazing tools to spine out web application companies.
The concept of the evening lightning talks and open discussions on topic of interest was showing the dynamism of the python community. I went to an open discussion on parallel computing and people are moving to python. Now that the process lib has been integrate in 2.6, Twisted, Thrift, PyMPI, Numpy, Scipy power computing with python will expend.
I am glad that Yannick convinced me to attend. I think that I will attend to the tutorial next time and some Sprint to ensure the maximum knowledge assimilation.
During this small trip, I tried the mini laptop eee ha but I brought it back, you should buy the HE: the right shift is at the right place and the battery last longer. Pycon was inspiring !
Monday, March 23, 2009
Research, information management, contrastive divergence, no free lunch theorem, non-parametric...all related
In practice, great researcher are aware of this consequence of the no free lunch theorem and try to keep of good balance of papers reading and research exploration. By simply applying the contrastive divergence concept of your approach, you can gage your distance to the trend and get a estimation of the impact of a possible discovery.
The Research Machine Learning community, as most other community, has the tendency to recruit top grades students that are use to follow exactly the line of thoughts of their teachers. This long training process is, according to me, extremely damaging to the training of the researcher capacity (i.e.: suboptimal cost function). This explain why most of the master student are researcher cheap labor driving force because they can only experiment others ideas with minor contributions.
Top researchers allow their formal students to follow their own line of thoughts or if they have no specific ideas, suggest ideas. I won't have done a research master without this freedom, thanks Yoshua!
So, if you want to impact the most your community, limit the number of papers your are reading, make your own ideas and play with your concepts to train your own intuitions of the unknown guiding rules you are looking for.
You might say, what's your contributions, I haven't heard about it. My contribution is that I have build experimental proof of back-propagation optimization fundamental problems and build the skeleton of top level explanations. Usually, we don't publish this type of results until you find a solution to the problem which, unfortunately, I haven't reach but, it is coming slowly; it is a long process and I learned to be patient.
So, if you want to impact the most your community, limit the number of papers your are reading to ensure you don't constrain yourself to others models. Why using a parametric model that limits your solution space?
You can trust the collective research discovery learning process that ensure the evolution of the human kind because someone will find it or, use it to increase the likelihood your will make a important discovery (i.e.: use it as a contrastive divergence cost function). If everyone was applying this strategy or cost function, I am pretty sure we will evolve faster. In order to move to this step, we will need to encourage failure strategy publications to ensure other don't wast time reproducing the same ideas but this could be elaborated in another thread post that involve a society evolution.
Traders knows about this simple strategy, buy low, sell high, don't follow the trend, take risks.
Sunday, March 15, 2009
Pycon 2009 Chicago
In 3 days, I expect to learn more things then I have learned in the last 6 months and met pationated peoples. Here are some of the talks I will attend:
Designing Applications with Non-Relational Databases (#16)
How Python is Developed (#116)
Twisted, AMQP and Thrift: Bridging messaging and RPC for building scalable distributed applications (#40)
Introduction to Multiprocessing in Python (#6)
The State of the Python Community: Leading the Python tribe (#118)
Google App Engine: How to survive in Google's Ecosystem (#53)
A Whirlwind Excursion through Writing a C Extension (#68)
Abstraction as Leverage (#110)
A winning combination: Plone as a CMS, your favorite Python web framework as a frontend (#100)
Greedy agile, waterfall and local minima
Wednesday, March 4, 2009
Montreal Python 6: 2009-04-14; Machine Learning empowered by Python
Our main presenter will be Francis Piéraut on Machine Learning empowered by Python as announced during the flash introduction in Montreal-Python 5.
Machine Learning is a subfield of AI that considers learning patterns from existing data. Related applications are increasing in many fields where adaptive systems are needed, like fraud detection, face recognition, recommendation systems, disambiguation systems, insurance risk estimation, web traffic filtering, voice recognition, and many others.
The first part of this presentation will cover the basics of machine learning; in the second part, we will dive into a real example and see the complete process of using machine learning to create a real-time digit recognition system using Mlboost, a python library. The practical approach should allow the audience to assimilate the most important concepts of machine learning and the critical need for data preprocessing.
After a Software Engineer degree, Francis Piéraut made a research master in Machine Learning at LISA. During his research work, he developed flayers, a powerful C++ neural network library. During the beginning of his career, his spend several years in Montreal startups companies applying Machine Learning and statistical AI related solutions. In 2005, he released the first version of MLboost, a python library that allows him to speedup his Machine Learning projects by simplifying data preprocessing, features selection and data visualization.
Essay on Adaptation, leaky cost function and online Learning...a society analogy
To make the bridge with the 3 first concepts, I will use a analogy with the Quebec society.
In order the learn, we need adaptive systems as Neural Networks. In online learning, the adaptation capacity should stay constant along the time. Local minimums can screw you up but let's ignore it for the time been.
Quebec society analogy intro
In order to understand my analogy with the society I live in, I want to share some of my reflexions about the puzzle to understand the Quebec Society. I am born in France and I migrated to Quebec at 9 years olds. During the last 5 years, I tried to elucidate my profound incomprehension of the deep ambitions of the French Quebec Society, if they have some;). According to me, it seems to be a leaky cost function assumption that lead them to their stagnation, coming assimilation and their slow extinction.
To understand my point of view, we need to elaborate on key concepts which are adaptation, equality versus inequality and education access.
Adaptation pros and cons, local minimal and ambition
Adaptation is one of the greatest ability of the human kind but also one of the worst. Adaptation to mediocrity can be a survival strategy to get through hard time but getting use to it reflects true low ambitions or incapacity to do online learning. The Quebec nation seems to have this disease.
- Quebec people accept staying in a destructive mode seen 1982, a constitutional status quo that lead to politic instability, economic stagnation and reduced political power by excluding them self from the Canadian power with the Bloc Quebecois for too long.
- Quebec people accept the status of a sub nation (nation inside the Canada).
- Quebec people accept mediocre governments, mediocre public transport systems, way too expensive and inefficient heath system, highest taxes in north america etc.
Cost function assumption : Equality versus inequality
From an anthropology point of view, French nuclear family lead to a conception of a world of equality (see Emmanuel Todd). to simplify, everyone should have the same chance, same access to education, same heath services and so on.
The Anglos-Saxon culture lead to the conception of an inequality world. The inequality conception lead people to work harder knowing there is no lower boundary and they can go deeper if they are too lazy.
Knowing we are born unequal, Anglos-axon assumption conception seems to be better adapted to human kind reality. On the other hand equality lead to an education level increase of the society indepedandly of the economy which as lot of pros and cons.
Why equality is a weak assumption? Equality can stand in rich societies because they can afford it. Unfortunately, Quebec society is getting poorer and its population is disadvantaged by its illusion of equality that leaks from everywhere (i.e.: health system, education, public kid garden etc.)
Missing link: Education and production of wealth
Quebec has the most affordable access to education in north America and few take advantage of it. Anyone knows that the more educated is your society, the more productive and healthy she will be and the more accessible the utopia of an equality world can be possible. By using education to get more productive, a society will create wealth and can afford utopia as the equality concept. French society seems to miss this key point.
French Quebec society is dying and the Anglo-Saxon supremacy should take over
The leaky equality concept and the low ambition of the Quebec society seems to lead this society to online learning incapacity, its incapacity to adapt further. This incapacity leads to its extermination by the growing assimilation to the Anglo-Saxon supremacy of its global cost function model. Inequality based cost function seems to be better adapted for a society that want to stay alive.
Wrap up (it is time to conclude)
Leaky cost function assumption can lead online learning to adaptation incapacity like been stuck in a local minimal as a slow death as the French Quebec nation folklorisation current process. It is simply the evolution, a Darwin consequence, who can't adapt simply die. Facing reality makes life easier.
The most important thing is that the cost function should reflect your goals. If you have a supervisor, try to get a good estimation of its cost function because it will simplify your ascention everywhere.
Quebec French culture creates a huge retention for me to stay in montreal but I wish Montreal a better drive for machine learning, startups as you can get in California. Montreal is simply sub-exploited. Don't take my words for granted, it is an essay. Make you own judgement from your own eyes and exploration.