Hi so I am very new to web scraping and I am trying out the basics for it. Right now, I wanted to extract links from a root website (coventry.gov.uk). The problem was, however, I could not get the list of links that were from that website. Only two unknown websites were found.
This was my code:
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.coventry.gov.uk/a-to-z/A")
soup = BeautifulSoup(response.content, "html.parser")
lists = soup.find_all("ul")
for list_item in lists:
links = list_item.find_all("a")
# Extract the URLs from the anchor tags
for link in links:
href = link.get("href")
if href and href.startswith("https"):
print(href)
I tried my best to take the links from the list since it was inside the list element after all. But I still won’t get the desired links.
Gs can’t is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
The problem is with href.startswith('https')
. The links are not absolute, so no link begins with https
. Try this instead:
import requests
from bs4 import BeautifulSoup
url = "https://www.coventry.gov.uk/a-to-z/A"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".list--record a"):
print("https://www.coventry.gov.uk" + a["href"])
Prints:
https://www.coventry.gov.uk/a-to-z/service/87/abandoned-vehicles
https://www.coventry.gov.uk/a-to-z/service/349/abuse-adults
https://www.coventry.gov.uk/a-to-z/service/847/abuse-children
https://www.coventry.gov.uk/a-to-z/service/403/accessibility-website
https://www.coventry.gov.uk/a-to-z/service/534/accounts-inspection
https://www.coventry.gov.uk/a-to-z/service/876/address-change
https://www.coventry.gov.uk/a-to-z/service/49/adoption
https://www.coventry.gov.uk/a-to-z/service/54/adult-carers
https://www.coventry.gov.uk/a-to-z/service/211/adult-education
https://www.coventry.gov.uk/a-to-z/service/951/adult-social-care
https://www.coventry.gov.uk/a-to-z/service/89/air-pollution
https://www.coventry.gov.uk/a-to-z/service/89/air-quality
https://www.coventry.gov.uk/a-to-z/service/421/alcohol-advice-and-support
https://www.coventry.gov.uk/a-to-z/service/999/alcohol-misuse-support
https://www.coventry.gov.uk/a-to-z/service/821/allesley-parish-council
https://www.coventry.gov.uk/a-to-z/service/151/allotments
https://www.coventry.gov.uk/a-to-z/service/718/allowances-and-expenses-for-councillors
https://www.coventry.gov.uk/a-to-z/service/1037/alternative-learning-opportunities
https://www.coventry.gov.uk/a-to-z/service/106/animals
https://www.coventry.gov.uk/a-to-z/service/862/annual-accounts
https://www.coventry.gov.uk/a-to-z/service/736/anti-social-behaviour
https://www.coventry.gov.uk/a-to-z/service/958/apprenticeships
https://www.coventry.gov.uk/a-to-z/service/230/archaeology-
https://www.coventry.gov.uk/a-to-z/service/195/archives
https://www.coventry.gov.uk/a-to-z/service/935/armed-forces-support
https://www.coventry.gov.uk/a-to-z/service/1039/arts-and-culture
https://www.coventry.gov.uk/a-to-z/service/887/asbestos
https://www.coventry.gov.uk/a-to-z/service/1050/asylum-seekers
https://www.coventry.gov.uk/a-to-z/service/1078/attendance-and-inclusion
https://www.coventry.gov.uk/a-to-z/service/860/audit-and-procurement-committee-
https://www.coventry.gov.uk/a-to-z/service/1089/avian-influenza
https://www.coventry.gov.uk/a-to-z/service/369/award-of-merit