Tuesday, September 21, 2010

Summary of machine learning libs available in python

Here is a summary of all python related machine learning libraries in python (inspired by Similar or Related Projects of PyMVPA, lisa mailing list and personal notes).
  • pybrain: PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive "Backronym". see features. key feature : ecurrent networks (RNN), including Long Short-Term Memory (LSTM) architectures
  • mlpy:Machine Learning PYthon (mlpy) is a high-performance Python library for predictive modeling. mlpy makes extensive use of NumPy to provide fast N-dimensional array manipulation and easy integration of C code. The GNU Scientific Library ( GSL) is also required. It provides high level procedures that support, with few lines of code, the design of rich Data Analysis Protocols (DAPs) for preprocessing, clustering, predictive classification, regression and feature selection. Methods are available for feature weighting and ranking, data resampling, error evaluation and experiment landscaping. Key feature: feature selection
  • scikit.learn: scikits.learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib). Key distinct features: lasso, nearest neighbor, isomap, various metrics, mean shift, cross validation, LDA, HMMs
  • opencv (machine learning): Normal Bayes Classifier, K Nearest Neighbors, SVM, Decision Trees, Boosting, Random Trees, Expectation-Maximization, Neural Networks
  • Shogun: A Large Scale Machine Learning Toolbox Comprehensive machine learning toolbox with bindings to various programming languages. PyMVPA can optionally use implementations of Support Vector Machines from Shogun. Large scale kernel learning (mostly svms). this wraps other libraries such as libsvm (well-established) and others that get state of the art performance or are good for extremely large datasets, etc.
  • PyMVPA (Multivariate Pattern Analysis in Python): PyMVPA is a Python module intended to ease pattern classification analyses of large datasets. In the neuroimaging contexts such analysis techniques are also known asdecoding or MVPA analysis.
  • pylearn (build on top of theano), under V2 construction. New version of plean (c++).
  • Theano: (deep learning) Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
  • jml: Jeremy's Machine Learning library (C++), include a python interface: Basic classifiers (perceptrons, decision trees, etc) plus ensemble methods (boosting, bagging). Very highly optimized to work with thousands of features and millions of examples. GPGPU support under development. Code derived from this library is extensively used in a commercial computational linguistics application, so it has gone through its paces.
  • 3dsvm: AFNI plugin to apply support vector machine classifiers to fMRI data.
  • Elefant: Efficient Learning, Large-scale Inference, and Optimization Toolkit. Multi-purpose open source library for machine learning.
  • MDP Python data processing framework. MDP provides various algorithms. PyMVPA makes use of MDP’s PCA and ICA implementations. interesting features: ica, LLE
  • MVPA Toolbox: Matlab-based toolbox to facilitate multi-voxel pattern analysis of fMRI neuroimaging data.
  • NiPy: Project with growing functionality to analyze brain imaging data. NiPy is heavily connected to SciPy and lots of functionality developed within NiPy becomes part of SciPy.
  • OpenMEEG: Software package for low-frequency bio-electromagnetism including the EEG/MEG forward and inverse problems. OpenMEEG includes Python bindings.
  • Orange: Powerful general-purpose data mining software. Orange also has Python bindings.
  • PyMGH/PyFSIO: Python IO library to for FreeSurfer’s .mgh data format.
  • PyML: PyML is an interactive object oriented framework for machine learning written in Python. PyML focuses on SVMs and other kernel methods.
  • PyNIfTI: Read and write NIfTI images from within Python. PyMVPA uses PyNIfTI to access MRI datasets.
  • milk: k-means, svm's with arbitrary python types for kernel arguments. Pythonic interface to libSVM. Stepwise Discriminant Analysis for feature selection. K-means clustering. odels can be pickled and unpickled.
  • mlboost: Machine Learning Boost Library (python; includes flayers wrapper); minimal version of sourceforge mlboost project. Specialized on features extraction and visualization.
to watch: