I am working with data feeds from affiliate sites. The basic idea is to provide an interface where the user can paste a link to an XML datafeed (these are huge btw, around 60 mb) that would then be streamed, parsed into small chunks, and mined for the required data which would then be stored in the database.
The problem is that different affiliate sites have different Schemas for their XML’s. It is a little hard mapping the elements in an XML to your database attributes when you don’t actually know which element contains what.
My Solution: Use XPath to traverse through the first set of parent and it’s descendent’s, fetch the elements as well as the data and and ask the user to map this data to the attributes in the database by selecting from a set of radio buttons that represent the attributes from the database. This will be done just once for each new Feed, once the system know’s what’s what it will automatically upload the data from the XML to the database.
Does this sound viable? Is there a better solution? I realize this leaves an uncomfortable opening for human error..
Thanks.
Does this sound viable?
It is completely viable. XML data can be fetched by cURL through PHP and parsed as XML data.
For example, YouTube video data can be fetched as XML by:
1) cURL the XML data. YouTube follows the link format of:
http://gdata.youtube.com/feeds/videos/YOUTUBEID
2) XML Parser Create the object
3) XML Parse Into Struct the fetched content. You may need to write some custom functions to traverse the data to extract what you want. I usually write custom generic functions such as “getXMLvalueByAttribute”, “getXMLvalueByTag”, and “getTagStringData” to easily do this.
4) XML Parser Free the object once you are done storing it.
You did not specify exactly what kind of XML data, so I gave you the YouTube example above.
Is there a better solution?
I would definately check if the service/website has an API available. This normally will save you time and energy if one is available.
1