I was looking for an good implementation of an incremental k-means where I don't have to set the optimal K. There are interesting papers (x-means, gmeans etc.) but couldn't find any python implementation.
I have decided to write a incremental version on top of sklearn.
The idea is simple:
A special thanks to scikit-learn lib to let me prototype this version so fast.
I have decided to write a incremental version on top of sklearn.
The idea is simple:
- Start at K=x
- identify worst cluster based on an unsupervised measure (ex: silhouette)
- Split the worst cluster into 2 clusters
- measure the global improvement with the new clusters
- if you get an improvement continue adding clusters
A special thanks to scikit-learn lib to let me prototype this version so fast.
No comments:
Post a Comment