I have a high dimension variable dataset which I want to classify into different groups. The data within can be confidently classified into 5 distinct groups.
I want to use the result of this clusterization process to create visual charts showcasing the difference within the resulting groups. As part of this task, I want to label each cluster so that instead of showing cluster 0
I show Industrial cluster
.
However, since Kmeans is a random algorithm, it could happen that cluster 0
stops being the Industrial cluster
. I’ve already set a random seed in my code to prevent this, but I expect my dataset to grow in the future.
I’m positive this could have the same effect as a random seed. I want to know how to consistently label my clusters and not worry when the numerical labels of the clusters get rearranged due to random changes in the data. So far the only idea I have is to use a data point from each expected cluster to change the cluster label automatically but I want a more robust solution.