I need to calculate the average cosine similarity for a large number of matrix pairs (approximately 80,000 pairs). Currently, each pair takes about 20 seconds to process, which is too slow for my needs. I would greatly appreciate any advice or solutions to speed up this calculation.
Example of the code I used now:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
matrix_a = [[...]] # matrix_a.shape: (5310000, 1602200)
matrix_b = [[matrix_1],[matrix_2],...,[matrix_n]] #similar shape to matrix a
similarity = np.mean(cosine_similarity(matrix_b, matrix_a), axis=1)
New contributor
Atony is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1