currently working on a python-script that fetches all data from a Wiki-page: the contact data from the following wikipedia based list https://de.wikipedia.org/wiki/Liste_der_Genossenschaftsbanken_in_Deutschland
well i think that an appropriate method could be to make use of beautiful soup and pandas
in short: i think best would be to create a Python scraper working against the above mentioned Wikipedia-site: BS4, Pandas, in order to fetch a list of data from all the derived pages:
step 0: To fetch all the contact data from the Wikipedia page listing Genossenschaftsbanken i think i can use BeautifulSoup and Python. firstly i need to identify the table containing the contact information and then i can extract the data from it.
Here’s how i think i should go for it:
firstly: Inspect the Webpage: i think that all the important information – of a typical Wikipedia page we have in this little task: – and this could be a good approach for me – to dive into learing of python-scraper: so here is my start: ( https://de.wikipedia.org/wiki/Liste_der_Genossenschaftsbanken_in_Deutschland ) and on this page i firstly need to inspect the HTML structure to locate the table containing the contact information of the according banks:
so here we go
import requests
from bs4 import BeautifulSoup
import pandas as pd
# URL of the Wikipedia page
url = "https://de.wikipedia.org/wiki/Liste_der_Genossenschaftsbanken_in_Deutschland"
# Send a GET request to the URL
response = requests.get(url)
# Parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")
# Find the table containing the bank data
table = soup.find("table", {"class": "wikitable"})
# Initialize lists to store data
banks = []
contacts = []
websites = []
# Extract data from the table
for row in table.find_all("tr")[1:]:
cols = row.find_all("td")
# Bank name is in the first column
banks.append(cols[0].text.strip())
# Contact information is in the second column
contacts.append(cols[1].text.strip())
# Check if there's a link in the contact cell (for the website)
link = cols[1].find("a")
if link:
websites.append(link.get("href"))
else:
websites.append("")
# Create a DataFrame using pandas
bank_data = pd.DataFrame({"Bank": banks, "Contact": contacts, "Website": websites})
# Print the DataFrame
print(bank_data)
the output so far.
Bank Contact
0 BWGV Baden-Württembergischer Genossenschaftsverband...
1 GVB Genossenschaftsverband Bayern e. V.
2 GV Genoverband e. V.
3 GVWE Genossenschaftsverband Weser-Ems e. V.
4 GPVMV Genossenschaftlicher Prüfungsverband Mecklenbu...
5 PDG PDG Genossenschaftlicher Prüfungsverband e. V.
6 Verband der Sparda-Banken e. V.
7 Verband der PSD Banken e. V.
Website
0 /wiki/Baden-W%C3%BCrttembergischer_Genossensch...
1 /wiki/Genossenschaftsverband_Bayern
2 /wiki/Genoverband
3 /wiki/Genossenschaftsverband_Weser-Ems
4
5
6 /wiki/Sparda-Bank_(Deutschland)
7 /wiki/PSD_Bank