I did a simple web scraping and I want to make sure are all my steps correct? Is it considered clean code?
But I feel that there is a better way to use Multiple Page Scraping. Can you help and improve the code?
Is my use of functions correct or not?
Finally, what is the next step to improve my level of web scraping? Any tips or resources?
import requests
from bs4 import BeautifulSoup
import pandas as pd
def main():
data = []
for page_num in range(1,51):
url = f'https://books.toscrape.com/catalogue/page-{page_num}.html'
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'}
response = requests.get(url, headers = headers)
soup = BeautifulSoup(response.content, "lxml")
books = soup.find_all('article', class_ = 'product_pod')
for book in books:
name = book.find('img').attrs['alt']
price = book.find('p', class_ = 'price_color').text.strip()
link = 'https://books.toscrape.com/' + book.find('a').attrs['href']
stock = book.find('p', class_ = 'instock availability').text.strip()
data.append([name, price, link,stock])
df = pd.DataFrame(data, columns=['name', 'price', 'link', 'stock'])
df.to_csv('data.csv')
main()
I want to help improve the code and make it clean like a professional