Showing posts with label dimensionality reduction. Show all posts
Showing posts with label dimensionality reduction. Show all posts

Wednesday, September 25, 2013

A simple way to identify outliers and focus on clusters

A simple way to identify outliers is to apply a 2D dimension reduction and click on the outliers to retrieve the original record as follow.

This feature is now integrated by default in mlboost
Now, can you zoom/focus on specific clusters? Now you can if some tagged data by excluding classes (-e) or limiting classes to specific ones (-o) like this. 


Don't have to much fun and don't forget you can specify the dimension reduction transformation used.

Tuesday, April 30, 2013

Simplifying clustering visualization with mlboost


Are you looking for a simple way to visualized your supervised or semi-supervised data clusters with different dimension reduction algorithms like PCA, LDA, isomap, LLE ,mds, random trees, spectral embedding  etc.?
Here is an output example on 4 newsgroups dataset.

If you are following sklearn loading standard, with mlboost, you can do it by changing 2 lines of code (line #5 and #6) or modify this example. (python yourvisu.py -m y)

1
2
3
4
5
6
7
import sys
from mlboost.clustering import visu

# add your data loading function that return data_train and data_test
from X import LOAD_DATASET_Y
visu.add_loading_dataset_fct('y', LOAD_DATASET_Y)
visu.main(sys.argv[1:])
Btw, if you click on the legend, it will remove the class as you can see here when I remove the green class 2. In the context of semi-supervised, simply set samples class to "?" (dataset.target[i]). 
 

Without scikit-learn and matplotlib, it won't be that easy to experiment visualization. 

Sunday, October 10, 2010

Dimensionality reduction; a simple PCA example using python




Dimensionality reduction is a powerful approach to reduce inputs size, reduce training time and visualize data.
As an example, you can use PCA(Principal Component Analysis) or ICA (independent component analysis) or LLE (Locally Linear Embedding).
to see class grouping. You can try it on your data easily with python in a couple of lines.
import mdp
pca = mdp.pca(ds.data)
pylab.title("PCA")
pylab.plot(pca[:,0], pca[:,1], '.')
The figure presents the PCA dimensionally reduction applied on a digit dataset. You can find the source code here to see you to do a PCA, ICA or LLE using python. Unfortunately, ICA doesn't work on our dataset because it doesn't converge.