I want to webscrape data from a website for the time frame of last 10 years, the data is a pdf that i want to download that changes everyday. When i open the website normally in a browser the pdf is downloaded normally but when i try to do the same using selenium in python it gives me an error. The script works perfectly and has no errors in it itself but the pdfs dont download. The robots.txt for this website disallows webscraping for a certain area of the website (for ex. Market data) but the url i open using the driver doesnt have market data in it but the tab is already selected when i open the url.
I tried using a user agent but it didnt work and i cant find any solution online is it that i cannot scrape from this section of the website even though that section isnt in the url i am opening but is already selected when i open that url ( indicating that the url location is reached by navigating via the market data tab). Also if the issue is that the website doesnt allow webscraping for that tab is there any way to work around it and scrape the data anyway.
Pradyun Igatpurikar is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.