I’m somewhat new to programming. The basic idea is that I’m working on a project and need to send a request to a website and scrape some data from it. However, the website I’m trying to retrieve data from requires a subscription (in the form of a username and password) to access it.
Sending a request without the details does not work.
import pandas as pd
import requests
import yfinance as yf
url = 'https://learn.mystrategicforecast.com/courses/take/insidethenumbers/multimedia/13520859-insidethenumbers-today'
auth = {
'j_username': *my_username_or_email* ,
'j_password': *my_password*
}
df = pd.read_html(url, storage_options=auth)
df
I tried calling the read_html() function directly without any additional parameters. The error I got was a 403 Error:
HTTPError: HTTP Error 403: Forbidden
So I think this is because I need to somehow pass in a username/email and password, which I didn’t do before.
I figured out how to do this with just a request (I used requesst.get), and it said that the request was successfully sent (the status code is 200).
However, there are still two problems I’m facing:
- The first is that when I try to print r.content (r is the request itself), it prints a lot of what seems to be code rather than the webpage content itself.
- I need to be able to get a successful request through read_html() rather than requests.get().
I tried passing in a dictionary with a username and password and got something with requests.get(), which was the very long string of code on one line. However, I still got the Error 403 with read_html().
Sorry for the long post, but can anyone please help in this?