Given a set of N points, I am required to split it into k subsets S1, …, Sk. Each subset Si will have a representative Ri. I want to find these R1, …, Rk to minimize an arbitrary cost function of the subset-members with respect to the cluster representative. Basically,
min sum_{i=1}^k sum_{Pj in Si} cost(Pj, Ri)
where the cluster representative Ri is itself obtained from the cluster members by another arbitrary reduce function
Ri = reduce(Si)
I took inspiration from k-means clustering, and came up with the below algorithm, that I’m calling k-reduce clustering. I want to know if there is any algorithm or family of algorithms that does what I’m trying to do.
# Initialize using k random samples from S, could use better initialization
cluster_repr = random_samples(S, k) # list of k points
clusters = None
while True:
# Step 1: Assignment
old_clusters = clusters
clusters = [[] for i in range(n)]
for Pj in S:
# assign s to the cluster that minimizes its cost wrt to the representative
cluster_idx = argmin(
cluster_repr,
lambda Ri : cost_fn(Pj, Ri)
)
clusters[cluster_idx].append(Pj)
# Step 2: Update step
cluster_repr = [reduce_fn(clusters[i]) for i in range(n)]
total_dist = None
if old_clusters == clusters: break