So I was trying to look into models that could help determine if a company name is the same. So basically I have a dataset that lists a bunch of company names that different investment funds hold but every investment fund may have a different naming convention for the company name such as Apple Inc or Apple Common stock. Is there any model that can help determine if they are the same company name. I Know this might just be beyond something I can do but I know chatgpt can do this really well I can pass it like 20 company names and it will tell me if they are all the same or not but obviously my dataset has 500,000 rows so I don’t know anyway I can just pass that data to chatgpt and I have heard it fails on longer computations.
I looked into levenshtein distance and fuzzy matching but I feel like it fails when the string has stuff like common stock in the name even though as a human it is clear Apple inc and Apple Us common stock would refer to the same company.So any machine learning algorithm or models that could help me tackle this problem would be greatly appreciated. I’m happy to look into them myself I just want a starting point as fuzzy matching and the levenshtein distance has not been very helpful unless I potentially find a way to remove common filler words such as common stock, ptd, lty etc. Thanks for any help :))
Lypse is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.