I want to apply k-means clustering to a (sparse) adjacency graph. For this I need to assign the nodes to a position in an euclidean space. Trivially I can do this by having a space with as many dimensions as there are nodes where each component corresponds to another node.
Now, in order to make the clustering a bit more understandable, or maybe just different, I’m thinking about whether there are ways of “folding” the n-dimensional space of points into a more manageable space, perhaps 3-dim or maybe just m-dim where m < n. While doing this I want to maintain distance relations between nodes – so I want to minimize the deviation between the straight line distance between any two points before the fold compared to after the fold, or in other words, the sum of all deviations.
So I can’t seem to find any to-the-point info on this when looking. Perhaps I’m missing terminology. What good/efficient methods are there for this? Algorithms? It seems like a somewhat archetypical optimization/mean square minimization problem.
A simple base case:
P1 = (1,1,1)
P2 = (1,1,0)
p3 = (1,0,1)
newPoints = spacefold(2,[P1,P2,P3])
// example output:
// newPoints = [(1,1),
// (1,0),
// (0,1)]
For this example they happened to be conveniently placed in a plane – removing the x component gives an optimal solution.
2
Multidimensional scaling is probably what you’re looking for. It takes as input a distance matrix and assigns to each object a vector of the target dimension.