I need to convert a url to pdf. Most links are working fine, but there are some which are slow in loading or in this case doing a connection security check, which results in pdf page with no proper result ( look at this image expected vs outcome.
To counter this i used timeout as converter.convert(url, pdf_filename, **timeout=10**)
, but this resulted in error
DevTools listening on ws://127.0.0.1:59394/devtools/browser/45008637-d758-43a7-b715-a03731ae80e9
[0521/102951.525:INFO:CONSOLE(1910)] "Mixed Content: The page at 'https://www.insightsonindia.com/2024/05/20/mission-2024-insights-daily-current-affairs-pib-summary-20-May-2024/' was loaded over HTTPS, but requested an insecure font 'http://www.insightsonindia.com/wp-content/uploads/2022/05/Cheltenham-Light.ttf'. This request has been blocked; the content must be served over HTTPS.", source: https://www.insightsonindia.com/2024/05/20/mission-2024-insights-daily-current-affairs-pib-summary-20-May-2024/ (1910)
... some similar lines , can't post whole as SO flagging this as spam...
[0521/102952.513:INFO:CONSOLE(2)] "JQMIGRATE: Migrate is installed, version 3.4.1", source: https://www.insightsonindia.com/wp-content/cache/min/1/c/6.5.3/wp-includes/js/jquery/jquery-migrate.min.js?ver=1716204483 (2)
[0521/102953.508:INFO:CONSOLE(3)] "Hotjar not launching due to suspicious userAgent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/125.0.6422.61 Safari/537.36", source: https://static.hotjar.com/c/hotjar-2787889.js?sv=7 (3)
Traceback (most recent call last):
File "C:UsersuserDocumentschange move user pcpython website data downloadfinallinkstopdf.py", line 31, in <module>
converter.convert(url, pdf_filename, timeout=10)
File "C:ProgramsPythonPython312Libsite-packagespyhtml2pdfconverter.py", line 45, in convert
file.write(result)
TypeError: a bytes-like object is required, not 'NoneType'
Further to counter this error i tried
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
#options.add_argument("--window-size=1980,1020")
options.add_argument("--headles=news")
options.add_argument("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.61 Safari/537.36")
and used wkhtml2pdf,pdfkit with delay , but that is also producing same outcome.( these both are hell slow on some sites in general, so I’m not using them anymore).
I also tried changing the link from https to http, and changing site settings to allow insecure, neither of them worked.
Also if I put time to zero as converter.convert(url, pdf_filename, **timeout=0**)
, its not giving error, but getting same unexpected outcome.
Is there any solution for it, or any other python library that can make this work.
Thanks.
xmsk is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.