Finding the optimal K in kmean: a incremental kmeans in python

Saturday, April 6, 2013

Finding the optimal K in kmean: a incremental kmeans in python

I was looking for an good implementation of an incremental k-means where I don't have to set the optimal K. There are interesting papers (x-means, gmeans etc.) but couldn't find any python implementation.

I have decided to write a incremental version on top of sklearn.
The idea is simple:

Start at K=x
identify worst cluster based on an unsupervised measure (ex: silhouette)
Split the worst cluster into 2 clusters
measure the global improvement with the new clusters
if you get an improvement continue adding clusters

You can find the source code in mlboost/clustering/ikmeans.py
A special thanks to scikit-learn lib to let me prototype this version so fast.

Fraka6 Blog - No Free Lunch

Saturday, April 6, 2013

Finding the optimal K in kmean: a incremental kmeans in python

No comments:

Post a Comment

Blog Archive