This is my code for now:
import requests
from bs4 import BeautifulSoup
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.piggyback.com/online-guide/final-fantasy-x/de/")
soup = BeautifulSoup(htmldata, 'html.parser')
for item in soup.find_all('img', class_="ImgBitmap__image___29vcf"):
print(item['src'])
I’m guessing that i did something wrong. If you need any details, please ask. While you’re at it, can you teach me also how to download that said source? Meaning downloading all the images of that website. 😀
flow.ey is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
The issue is the page is rendered from underlying PDF document (so beautifulsoup doesn’t see any images).
You can however use requests
to download the PDF file (and then you can convert the PDF file to images as next step if you want):
import re
import requests
url = "https://www.piggyback.com/online-guide/final-fantasy-x/de/"
html_text = requests.get(url).text
pdf_url = re.search(r'url: "([^"]+)', html_text).group(1)
headers = {"Referer": "https://www.piggyback.com/"}
print(f"Downloading {pdf_url} ...")
with open(pdf_url.split("/")[-1], "wb") as f_out:
f_out.write(requests.get(pdf_url, headers=headers).content)
print("Done ...")
Prints:
Downloading https://storage-cdn.piggyback.com/storage/media/online-guide/final-fantasy-x/de/Final_Fantasy_X_Das_offizielle_Loesungsbuch.pdf ...
Done ...
and download ~168MB Final_Fantasy_X_Das_offizielle_Loesungsbuch.pdf