I am trying to scrape the Bitcoin markets section from CoinGecko using Python. However, I keep encountering a HTTP 403 error. I have tried using the requests library with custom headers to mimic a real browser, but I still get the same error.
Here is the code I am using:
import requests
import pandas as pd
# Base URL for Bitcoin markets on CoinGecko
base_url = "https://www.coingecko.com/en/coins/bitcoin"
# Function to fetch a single page
def fetch_page(url, page):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
response = requests.get(f"{url}?page={page}", headers=headers)
if response.status_code != 200:
print(f"Failed to fetch page {page}: Status code {response.status_code}")
return None
return response.text
# Function to extract market data from a page
def extract_markets(html):
dfs = pd.read_html(html)
return dfs[0] if dfs else pd.DataFrame()
# Main function to scrape all pages
def scrape_all_pages(base_url, max_pages=10):
all_markets = []
for page in range(1, max_pages + 1):
print(f"Scraping page {page}...")
html = fetch_page(base_url, page)
if html is None:
break
df = extract_markets(html)
if df.empty:
break
all_markets.append(df)
return pd.concat(all_markets, ignore_index=True) if all_markets else pd.DataFrame()
# Scrape data and store in a DataFrame
max_pages = 10 # Adjust this to scrape more pages if needed
df = scrape_all_pages(base_url, max_pages)
# Display the DataFrame
print(df)
error:
Scraping page 1...
Failed to fetch page 1: Status code 403
Empty DataFrame
Columns: []
Index: []
I also tried a suggested solution on stackoverflow, but it did not resolve the issue.
Could someone suggest a workaround or a more effective way to scrape this data? Any help would be greatly appreciated. Thank you in advance.