I want to reduce the running time of this computation. Is there any way?
Cosine Similarity
-
$$ text{cosine_similarity}(u, w) = frac{u cdot w}{|u| |w|} $$
$$ p_t(w|u) = frac{ text{cos}(u’, w)}{sum_{u_0 in V} text{cos}(u, w)} $$
Here is the Python code I am using:
def cosine_similarity(vec1, vec2):
dot_product = np.dot(vec1, vec2)
norm_vec1 = np.linalg.norm(vec1)
norm_vec2 = np.linalg.norm(vec2)
return dot_product / (norm_vec1 * norm_vec2)
# Compute translation probability using cosine similarity
def compute_translation_probability(target_word, candidate_word, word_embeddings):
if target_word in word_embeddings and candidate_word in word_embeddings:
target_vec = word_embeddings[target_word]
candidate_vec = word_embeddings[candidate_word]
# Calculate cosine similarity between target word and candidate word
cosine_sim = cosine_similarity(target_vec, candidate_vec)
# Calculate sum of cosine similarities between target word and all words in the embedding
sum_cosine_similarities = sum(cosine_similarity(target_vec, word_embeddings[word]) for word in word_embeddings.index_to_key)
# Return normalized cosine_sim
if sum_cosine_similarities != 0:
normalized_cosine_sim = cosine_sim / sum_cosine_similarities
return normalized_cosine_sim
return 0.0
New contributor
Bini Yoni is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.