I am trying to build a word finder where when I input a word, it will search for an associated word from my list of words and return it. the idea is that any word can be input and any which closely relate an item in the list will have output.
My first thought was to use cosine similarity and a sentence transformer to map the trajectories.
code
reference_list = ["Fantasy", "Horror", "Romance", "Action"]
st = SentenceTransformer(model)
reference_dict = {word: st.encode(word) for word in reference_list}
def find_reference_word(myword, reference_dict):
myword_vector = st.encode(myword)
check_dict = {}
for key, vector in reference_dict.items():
cosine = np.dot(myword_vector, vector)/(norm(myword_vector)*norm(vector))
if cosine > 0.6:
check_dict[key] = cosine
if check_dict:
closest = max(check_dict, key=check_dict.get)
return closest
while this works and does what I want, I can see that this would become a very slow process if my reference list grows.
I considered building a prediction model using one of the modules from scikit learn but going through the different methods, it seemed like my question might not be best answered here. I also thought about an LLM or looking into building a specialized version.