I have a large dataset (around 40,000) of entities in a .csv file and want to store their popularity by storing the number of search results they return when searched up. So, I’m trying to get it done fast and reliably. It doesn’t have to be really accurate, but it’s the best way I figured to get a good gauge of its public importance.
I’ve tried using beautiful soup, but it doesn’t always return a value. In fact if I search up something not really popular, it’ll show- value not present, but then if I search it up on my actual browser and then run the program, it’ll then give an output. Which is not really reliable. I found out it happens because of some JavaScript dynamic loading of the page.
import requests
import urllib3
from bs4 import BeautifulSoup
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
def get_search_results_count(query):
search_url = f"https://www.bing.com/search?q={query}"
response = requests.get(search_url, verify=False)
soup = BeautifulSoup(response.content, "html.parser")
count_element = soup.find("span", class_="sb_count")
if count_element:
count_text = count_element.text.strip()
return count_text
else:
return "Search results count not found"
search_query = "cheese"
result_count = get_search_results_count(search_query)
print(result_count)
numeric_part = ''.join(filter(str.isdigit, result_count))
print(numeric_part)
So then as a result, I tried using Selenium, but it takes too much time. Plus, it actually opens up the search engine on my computer, and then closes it, so it seems to be pretty inefficient.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
def get_search_results_count(query):
driver = webdriver.Edge()
try:
# Construct the search URL
search_url = f"https://www.bing.com/search?q={query}"
driver.get(search_url)
count_element = driver.find_element(By.CLASS_NAME, "sb_count")
count_text = count_element.text.strip()
return count_text
except NoSuchElementException:
return "Search results count not found"
finally:
driver.quit()
search_query = input("Enter the Company and Product name as one...:n")
result_count = get_search_results_count(search_query)
print(result_count)
numeric_part = ''.join(filter(str.isdigit, result_count))
search_results = numeric_part
I really don’t know what to do. I don’t want anything to do with the actual search results, I just want to know the amount which are returned. I once saw someone suggest the Google search API, but it doesn’t seem relatable to my problem of just wanting the result number, and it seems to be too complex for me to figure out, if someone could help me out with that, that would be great.
Hrithik Patel is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.