I have a web scraper for scraping news from different sources in wp7. My current appraoch for doing this is:
- load newspapers information from xml file.
- go to the specified sections and fetch the urls of the news items.
- go to each url and fetch headline, image, publisher.
- display using a MVVM architecture of windows phone.
The whole thing takes place asynchronously…meaning as soon as url from a section of a newspaper is fetched it is added to the queue, and the second stage consisting of fetching headline, image etc starts… and as soon this is fetched even for one article, it is displayed. Later on as more articles are fetched, they are added on to the list.
For the fetching purpose I am using a SmartThreadPool(http://www.codeproject.com/Articles/7933/Smart-Thread-Pool) for windows phone.
My problem is that…even for fetching around 80 items (in total) from 9 publications, it is taking more than a minute. How can i speed up the procedure?
Note: I have a two stage approach because many times the images are not available with headlines, and are only found in the article.
1
There is one foolproof technique for optimizing your code: measure.
Without knowing where the bottleneck is, any advice you get will have a good chance of being useless. Measure. Time your code. Make sure you know exactly what is taking so long.
Your code is broken down into segments, right? Measure them separately, then measure as a whole. Make sure you know how long it takes to load the XML data, how long it takes to fetch the list of articles from a source, how much time to fetch an item’s content, how much time it takes to load its images.
Unless you measure, your optimizations will be premature. It’s entirely possible you’ll spend a week optimizing your network fetching code, only to discover your problem was in your View Models which accidently fetched data multiple times.
Measure.