I’m trying to scrape headers of a table from a webpage using list comprehension. The problem I’m facing is that when I create the same headers using pandas, the appearance is vastly different. Just to inform you, the headers created by padas address all columns
.
from bs4 import BeautifulSoup
import pandas as pd
import requests
link = 'https://www03.cmhc-schl.gc.ca/hmip-pimh/en/TableMapChart/TableMatchingCriteria?GeographyType=SurveyZone&GeographyId=011002&CategoryLevel1=Primary%20Rental%20Market&CategoryLevel2=Vacancy%20Rate%20%28%25%29&ColumnField=2&RowField=TIMESERIES'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
col_headers = [item.text.replace("xa0","Unnamed: 0") for i in soup.select("table.CawdDataTable thead tr") for item in i.select("th, td")]
print(col_headers)
selector = soup.select_one("table.CawdDataTable")
df = pd.read_html(str(selector))[0]
print(list(df.columns))
Output: The first one is created by list comprehension, while the second one is created by pandas.
['Unnamed: 0', 'Bachelor', '1 Bedroom', '2 Bedroom', '3 Bedroom +', 'Total']
['Unnamed: 0', 'Bachelor', 'Bachelor.1', '1 Bedroom', '1 Bedroom.1', '2 Bedroom', '2 Bedroom.1', '3 Bedroom +', '3 Bedroom +.1', 'Total', 'Total.1']
Can I create similar column headers using list comprehension as pandas does?