I am trying to retrieve data from the “Things to Do” page on trip advisor.
I am trying to get the attraction name, number of reviews, and review score.
i have tried the following code found in previous question posted 4 years ago but it does not work anymore:
import requests
from bs4 import BeautifulSoup
Define header to prevent errors
user_agent = “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36”
headers = {
‘User-Agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36’,
‘Accept-Language’: ‘en-US,en;q=0.9’,
‘Accept-Encoding’: ‘gzip, deflate, br’,
‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8′,
‘Connection’: ‘keep-alive’,
‘Upgrade-Insecure-Requests’: ‘1’
}
URL of the TripAdvisor “things to do” page for Miami, Florida
url = “https://www.tripadvisor.com/Attractions-g188644-Activities-oa0-Brussels.html”
try:
# Get response from url with timeout
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status() # Check if the request was successful
# Soupify response
soup = BeautifulSoup(response.text, "lxml")
# Find elements containing the names of the attractions
attractions = soup.findAll("div", {"class": "attraction_element"})
# Iterate over attractions and extract information
things_to_do = []
for attraction in attractions:
# Example: Extract the name of the attraction
name = attraction.find("a", {"class": "attraction_name"}).text.strip()
things_to_do.append(name)
# Print the list of attractions
for i, item in enumerate(things_to_do, start=1):
print(f"{i}. {item}")
except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)
I got an error 403 Client Error: Forbidden for url.
Can you help?
user25312453 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.