In one of my projects, I have data for hotels, and other booking sites are able to book this hotel. For example:
Hotel A – Booking (ID = 4002), Expedia (ID = 123), Priceline (ID = 147)
The three booking engines each uses their own Id to reference to Hotel A. I would need to check manually and make the right reference to the hotel. If I have 100,000 hotels, I have to check manually 300,000 (considering 3 booking sites) times?
They might provide API, then I can cross check the name, address or latitude/longitude, but if they differ a little bit then I might give the wrong reference to the wrong hotel.
I’m sure there are better ways to do this. There are many travel sites out there which do hotel price checking on many booking sites, but how do they do to make sure they are checking the right hotel on these booking sites?
Anyone has any experience on this?
1
If I have 100,000 hotels, I have to check manually 300,000 (considering 3 booking sites) times?
No. That would not be the best way to do it.
They might provide API, then I can cross check the name, address or latitude/longitude, but if they differ a little bit then I might give the wrong reference to the wrong hotel.
You have the right idea.
Here is what you need to do:
-
“Clean” the incoming addresses with an address normalizer service.
Here’s a Stackoverflow question to get you started, but you can find much more information through Google.
-
Anticipate that the address normalizer service will not be a silver bullet.
Emphasis: You may need to link/resolve a very small number of addresses on your own.
-
When all addresses have been normalized/linked, you can create a linking table in your database so that you can translate addresses to codes for each booking site.