Fraka6 Blog - No Free Lunch: June 2009

Wednesday, June 17, 2009

ICML highlights summary

Let's try to summarized in several sentences what I have learned:

Language acquisition: Children loose their capacity to distinguish some phonemes to reduce the scope of choices in order to learn their environment language. The aquisition of phonemes categories from a buttom up approach isn't sufficient (signal processing+unsupervised clustering), a lexical minimal pair (ex:ngram) seems to be required to ensure the learning.
Trying to learn the best kernel that restrict optimization to a convex problem seems to be a death end. It might be time change paradigm or move to the non-convex dark side.
Boosting is too sensitive to noise but a robust framework has been presented by Yoav Freund
Deep Architecture seems to be the next big thing. Regularisation, auto encoder and RBF can be used to pre-train networks from un-label data. Temporal coherence (similarity of consecutive frames in video) can be used as a regulation unsupervised technique in the embedding space. Unsupervised training is a regularisation technique that enforce better clustering. The more unlabeled unsupervised examples are used, the better will be the generalization.
Training from IID samples isn't optimal, curriculum learning (i.e.: increase examples complexity) seems to smooth the cost function and lead to faster training and better generalization.
GPU is the way to go to make ML algo scalable.
Feature hashing is an efficient strategy for dimensionality reduction and can be used to train classifiers.
Sparse transformation simplify the optimisation process (i.e: same idea used in the Kernel trick in SVN). PCA is doing the opposite.

Research has stopped in Neural Networks because we couldn't estimate boundary error due to its non convex cost function, there was no more theoretical framework. SVN came to the rescue by providing 3 major benefits, a convex cost function, a better generalization process (margin maximization) and less parameters tuning.

Unfortunately, it doesn't scale well, kernel is hard or impossible to choose to reach optimal solution and it doesn't allow deep architecture. For the same capacity, a shallow architecture needs more neurons then a deep architecture and large shallow architecture are much more likely to numeric issue.

Deep architecture came back with convolution deep neural networks applied to objects recognition and them Hiton proposed a breakthrough, a generative approach to initialised the parameters.

Unsupervised learning lead neural networks to much better initialization state and its regularisation provides better generalisation. But now, even if we are doing better initialisation, we still aren't able to better explore the function space which leave, according to me, still open the question: is this an optimization problem?. Local mimima observe might be an illusion created by the effect of gradients cancellation from opposites gradients which is an optimisation problem induce by the leaky assumption of uncorrelated features which lead people to optimize all parameters at the same time. ICML was inspiring.

Wednesday, June 10, 2009

International Conference on Machine Learning (ICML2009)

The 26th International Conference on Machine Learning (ICML 2009) will take place in Montreal next week (14-18, 2009).

The 3 invited speakers are quite interesting. I look forward to heard them:

Emmanuel Dupoux, from Ecole Normale Superieure on: How do infants bootstrap into spoken language?
Yoav Freund, University of California on Drifting games, boosting and online learning?
Corinna Cortes, from Google on can learning kernels help performance?

I have do decide to which tutorials I will attend this sunday:

T6 Machine Learning in IR: Recent Successes and New Opportunities [tutorial webpage]Paul Bennett, Misha Bilenko, and Kevyn Collins-Thompson
T8 Large Social and Information Networks: Opportunities for ML [tutorial webpage]Jure Leskovec
T9 Structured Prediction for Natural Language Processing [tutorial webpage]Noah Smith

Here are some interesting papers:

Curriculum Learning [Full paper]
Deep Learning from Temporal Coherence in Video [Full paper]
Good Learners for Evil Teachers [Full paper]
Using Fast Weights to Improve Persistent Contrastive Divergence [Full paper]
Online Dictionary Learning for Sparse Coding [Full paper]
A Novel Lexicalized HMM-based Learning Framework for Web Opinion Mining [Full paper]
A Scalable Framework for Discovering Coherent Co-clusters in Noisy Data [Full paper]
Bayesian Clustering for Email Campaign Detection [Full paper]
Feature Hashing for Large Scale Multitask Learning [Full paper]
Grammatical Inference as a Principal Component Analysis Problem [Full paper]
Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations[Full paper]

I have to decide between thoses workshops on Thursday:

I look forward to meet old collegues, friends and new researchers. Next week will be awesome.