I have to make a database with names and emails from a website, but there are millions of them. In order to obtain the data first I need to access the website, then clic on a button that addresses me to a different page and in that page is where I can find the information I need.
I tried to made a python script in where first I obtain the urls from each button, then, I told python to get name and email from that url. I used the class name for each container, and i asked python to create for me a .txt file with the names and emails it didn’t work, the file is empty.
import requests
your text`from bs4 import BeautifulSoup
# WEBSITE URL
url = "https://www.bhhs.com/agent-search-results"
# HTTP REQUEST
response = requests.get(url)
# IF REQUEST SUCCESFUL, THEN:
if response.status_code == 200:
# Parsear el contenido HTML de la página
soup = BeautifulSoup(response.text, 'html.parser')
# Encontrar todos los elementos con la clase 'cmp-cta' que contienen href
enlaces = soup.find_all('a', class_='cmp-cta')
correos = []
# Extraer los href de cada enlace y realizar una nueva solicitud
for enlace in enlaces:
href = enlace.get('href')
if href:
# Realizar la solicitud a cada enlace
sub_response = requests.get(href)
if sub_response.status_code == 200:
sub_soup = BeautifulSoup(sub_response.text, 'html.parser')
# Encontrar el elemento con la clase 'cmp-agent-details__mail text-lowercase'
correo_elemento = sub_soup.find('div', class_='cmp-agent-details__mail text-lowercase')
if correo_elemento:
correo = correo_elemento.text.strip()
correos.append(correo)
print(f"Correo encontrado: {correo}")
else:
print(f"Error al acceder a la URL {href}: {sub_response.status_code}")
# Guardar los correos en un archivo de texto
with open('correos.txt', 'w', encoding='utf-8') as file:
for correo in correos:
file.write(f"{correo}n")
print("Extracción completada y datos guardados en 'correos.txt'.")
else:
print(f"Error en la solicitud: {response.status_code}")