Relative Content

Tag Archive for pythonscikit-learncluster-analysishierarchical-clustering

How to use the Agglomerative Clustering algorithm from scikit-learn python library with a declared number of objects in the cluster?

I use the scikit-learn Agglomerative Clustering python library in my code to automatically cluster points and place a new, larger point in the center of the cluster. I have a set of several thousand points with X and Y coordinates contained in a DataFrame. Then I want to use Agglomerative Clustering, but when setting the parameters I can only use n_clusters to set the resulting number of clusters or distance_threshold to set the maximum clustering distance. I would like to set the target number of points in each cluster, e.g. 200, so that each resulting cluster would have 200 points. It would also be good to assume a certain clustering error, i.e. clusters could have from 170 to 230 points. Is there a parameter that would help me? Or should I write a function that would join too small clusters to others and divide too large ones (or insert two centers in them)? Maybe I should use a different clustering algorithm?