I tried to websraping this website in order to get the award list. But after i saw in my csv file, the award ceremony is not load csv file, i don’t want the ref load into csv file, some award did not show the nominee, some award didn’t show the year, some award have a lot of quotation mark before and after the award name.
these are my code
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Step 1: Send a request to the website
url = 'https://en.wikipedia.org/wiki/List_of_awards_and_nominations_received_by_Exo'
response = requests.get(url)
# Step 2: Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Step 3: Find the awards tables
awards_tables = soup.find_all('table', {'class': 'wikitable'})
# Step 4: Extract data
awards = []
for table in awards_tables:
for row in table.find_all('tr')[1:]: # skip the header row
cells = row.find_all('td')
if len(cells) >= 4:
event = cells[0].get_text(strip=True)
award_name = cells[1].get_text(strip=True)
year = cells[2].get_text(strip=True)
group_name = cells[3].get_text(strip=True)
awards.append([event, award_name, year, group_name])
# Step 5: Create a DataFrame
df = pd.DataFrame(awards, columns=['Award Event', 'Award Name', 'Year', 'Group Name'])
# Step 6: Save to CSV
df.to_csv('exo_awards.csv', index=False)
print("Awards saved to exo_awards.csv")
example of the expected output :
the first award ceremony is America music award where EXO get nominated for two years (2019 and 2020)
expected output in csv file :
American Music award, 2019, EXO, favorite social artist, nominated
American Music award, 2020, EXO, favorite social artist, nominated
Zoeyyyy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.