Relative Content

Tag Archive for rmatrixparallel-processinghierarchical-clusteringeuclidean-distance

How to speed up R dist matrix for hierarchical clustering for large matrix input data?

I have a large matrix (approximately 35,000 x 35,000) and I’m preparing a distance object in R for hierarchical clustering. The base R function dist() is too slow, so I’m using the distances function from the distances package https://cran.r-project.org/web/packages/distances/distances.pdf. I have also implemented parallel processing to speed up the computation, but it still takes around 10 hours to run. Below is the code I am currently using. I utilize the final distance_matrix in the hclustgeo() function from the ClustGeo package.