Wednesday, October 13, 2010

simple multivariate classifier example using python & numpy

I was wondering how long it could take to write a multivariate classifier in python.
With python and numpy it isn't long. We simply need to be able to compute the covariance matrix, the determinant and to inverse a matrix (covariance matrix). Even if the matrix is singular, which mean it can't inverse it, you can compute the pseudo-inverse (Moore-Penrose) easily (i.e.: numpy.linalg.pinv).
As expected, assuming too much about the data lead to poor classification.
You can find a simple python program of 75 lines here.

Sunday, October 10, 2010

Dimensionality reduction; a simple PCA example using python




Dimensionality reduction is a powerful approach to reduce inputs size, reduce training time and visualize data.
As an example, you can use PCA(Principal Component Analysis) or ICA (independent component analysis) or LLE (Locally Linear Embedding).
to see class grouping. You can try it on your data easily with python in a couple of lines.
import mdp
pca = mdp.pca(ds.data)
pylab.title("PCA")
pylab.plot(pca[:,0], pca[:,1], '.')
The figure presents the PCA dimensionally reduction applied on a digit dataset. You can find the source code here to see you to do a PCA, ICA or LLE using python. Unfortunately, ICA doesn't work on our dataset because it doesn't converge.

Saturday, October 9, 2010

PDF watermarking service using pdfrw on google appengine

If you are looking to watermark a pdf, you can use this simple appengine service:
This service use pdfrw (a PDF file manipulation library written by Paul Gauvin) and reportlab. pdfrw is much faster then pypdf for watermarking.