Saturday, October 19, 2013

Deep Learning history and most important pointers

Prior to 2006, neural networks wasn't cool at all and were ignored by the machine learning community (in papers selection), after Bayesian Network domination, SVM introducted by Vapnick and the kernel trick was dominating the literature (kernel engineering boring papers). SVM had a major advantage, its convergence could be proven and it was more interesting for theoretical paper work than gradient descent which is simply a greedy optimization problem which was leading to local minimum problems
Despite its theoretical amazing capacity (been able to model any functions), no one was able to train big neural networks and it was even worst to train deep one which could represent similar functions with less parameters. Basically, moving beyond shallow
architecture which imply moving back to the non convex problem optimization nightmare.... back to the optimization research painful path. 
Despite the community momentum, Lecun, Hinton and Bengio were still doing research on the subject (thinking outside of the box). The deep learning revolution seems to have started from the discovery of Lecun, with its applications of convolution networks to face recognition. The structure of the topology could affect the training capability drastically. How could we learn this embedding? Then Hinton came with its revolution idea of deep belief networks and auto encoder (a generative model) to better initialize parameters and force lower parameters to learn low level features or embedding initialization state, basically adding data driven constraints to the natural manifold inherent to the problem.  

Deep learning is not the answer to AI but a major major breakthrough.
Why? Deep learning not just combines unsupervised learning and supervised learning but also solve partially the training of deep architecture which is critical to learn layers of intelligence (beyond shallow architecture). Furthermore, with neural networks, we don't need to make any assumptions about distributions, its scalable and not that hard to implement. Basically, we use unlabeled data to initialize deep architecture and then train it with our label data.
Concerning Deep learning ability to learn higher-level reasoning, it is the next big step. Learning to detect objects from images seems to me the right step towards that goal. Object is a higher level of representation as an idea from a sentence and so on. Concerning "Thinking, Fast and Slow", we can see the fast part as the evaluation of the model and the slow part as its retraining during sleeping (dreams are part of the retraining process). Deep learning is a revolution in the field of machine learning/AI and I am looking forward to see how Ng, Hinton, Bengio, Lecun and others will make it evolve to learn more layers of intelligence.
The next deep learning improvements are fuzzy auto-encoder, maxout, word2net
We shouldn't forget that SVN doesn't scale and with the explosion of data available, scalable approaches had to emerge despite the machine learning obstinacy to focus on un-scalable solutions because convergence is easier to prove.   

Here is a good summary on Learning Deep Architectures for AI: learn more about DeepLearning and available implementation (maintain by the Bengio Lisa Lab), take a look at and

No comments:

Post a Comment