I am making a price comparison web app which gets it’s data by web scraping. I scraped two grocery shop sites and have the raw data – item category, item name and item price. What would be the best way to manage categorization (one site’s category for fruits is Fruits and the other one did Fruits and Veggies, problems of such nature) and product matching. Ideally I would like to do this using php and its libraries but I’m open to everything at this point. My idea is to have a card displayed for every product and below it prices found in the two stores.
Looked into fuzzy matching and similar_text in php, but i have some doubts about it, “Coca Cola 2L” and “Coca Cola 0.5L” is scored 85% but those are two different products.
vlaxon is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1