I have decided to write a incremental version on top of sklearn.
The idea is simple:
- Start at K=x
- identify worst cluster based on an unsupervised measure (ex: silhouette)
- Split the worst cluster into 2 clusters
- measure the global improvement with the new clusters
- if you get an improvement continue adding clusters
A special thanks to scikit-learn lib to let me prototype this version so fast.