I have a crawling project and I’m wondering whether I can use only the sitemap to pull the URLs. I want to get all the articles on blog pages for a bunch of websites. If I can stick with the sitemap, then that makes my work drammatically easier.
I’m wondering whether it’s reasonable to expect that every blog post will be included in the sitemap and that it will be regularly updated. I don’t know how often sitemaps are updated and whether it’s an automatic process.