So, currently I’m have a list of vectors in my database, and I’m getting data from an API, from that API I loop over each of strings provided converting it to a vector & matching the most similar in my database. The issue is, the API provides a different name compared to what I have stored in my database although they are the same.
For example, in my database I have two colleges named SUNY College of Technology at Alfred
& Alfred University
. From the API I’m being returned college names Alfred State College
& Alfred University
. Obviously, the sentance similarity will give a perfect similarity for Alfred University
but instead of Alfred State College being matched with SUNY College of Technology at Alfred
it gets matched with Alfred University
and I understand why they aren’t being matched yet, they are the same college despite the two different names. What can I possibly do to make the system more accurate?
I tried adding the college state into the vectors & then match a vector by the college name and the state, yet both of those two colleges are the same state so it was a dead end. I was considering creating some function that will hold off on that data if there are multiple matches, and then it will push it to an array. It’ll continue until it finds a match with the similarity being 1, then it would differentiate the two and give the least accurate to the one that has a lower similarity. Would this work, and what would this be called?
What can I do?
Xyeut is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.