my questions is when I try to filter the cluster.idx file in any of the crawls for twitter.com or instagram.com, it gives back no result. However, for reddit.com it works fine. And also, there are APIs which provides scraped data from instagram/twitter, so I don’t actually know why the common crawl does no includes these websites.