I’ve been making an image scraper for a site and I noticed all of their media is available in the following format:
example.com/images/[20 digit alphanumeric lowercase string ei. zxqwvl7jl745hv08yz9j].jpg
There’s a lot of urls that are old media no longer used in the site but are still public(not on the sitemap). Any url that is not valid simply returns an S3 access denied xml.
Is my only option to brute force random urls til I get something?
It seems like iterating through 32^20 urls would take some time.