This website (https://oig.hhs.gov/reports-and-publications/all-reports-and-publications/) posts new government reports. I’m trying to write python/BS code to scrape the title of each new report (i.e. “Washington Medicaid Fraud Control Unit: 2023 Inspection”) and drop them all into a CSV.
I can’t seem to get the python code right though. It keeps producing an empty CSV. Any suggestions?
import requests
from bs4 import BeautifulSoup
import pandas as pd
# URL of the website to scrape
url = 'https://oig.hhs.gov/reports-and-publications/all-reports-and-publications/'
# Send a GET request to the website
response = requests.get(url)
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content of the page with BeautifulSoup
soup = BeautifulSoup(response.content, 'html.parser')
# Print the structure for debugging purposes
print(soup.prettify())
# Find all the report titles
titles = []
for title in soup.find_all('a', class_='item-title'):
titles.append(title.get_text(strip=True))
# If no titles are found, print a message
if not titles:
print("No titles found. The structure of the page might have changed.")
# Create a DataFrame from the list of titles
df = pd.DataFrame(titles, columns=['Report Title'])
# Save the DataFrame to a CSV file
df.to_csv('report_titles.csv', index=False)
print("Report titles have been saved to report_titles.csv")
else:
print(f"Failed to retrieve the webpage. Status code: {response.status_code}")