I am using selenium in Python to try and convert HTML files to PDF. Everything converts perfectly except bookmark links. The issue is consistent for all HTML files I’ve tried, and the exact issue is that the first 2 bookmarks in the PDF send you to almost the correct location but always slightly off, and then the rest always send you to the bottom of the pdf. These bookmarks do work correctly in the HTML.
Originally, I was attempting to convert a large file and I thought the issue was that things were not loaded so I used seleniums WebDriverWait to make sure everything was loaded but that did not solve the problem.
Example HTML file with bookmarks that break on conversion:
<!DOCTYPE html>
<html>
<body>
<p><a href="#C4">Jump to Chapter 4</a></p>
<p><a href="#C10">Jump to Chapter 10</a></p>
<p><a href="#C19">Jump to Chapter 19</a></p>
<!-- ... -->
<h2 id="C4">Chapter 4</h2>
<p>This chapter explains ba bla bla</p>
<!-- ... -->
<h2 id="C10">Chapter 10</h2>
<p>This chapter explains ba bla bla</p>
<!-- ... -->
<h2 id="C19">Chapter 19</h2>
<p>This chapter explains ba bla bla</p>
<!-- ... -->
</body>
</html>
Here is the Python code I use to convert files:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import base64
chrome_options = Options()
chrome_options.add_argument("--headless=new")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--ignore-ssl-errors")
driver = webdriver.Chrome(options=chrome_options)
driver.get("HTML FILE LOCATION HERE")
with open("out.pdf", "wb") as file:
codeb64 = driver.print_page()
decoded = base64.b64decode(codeb64)
file.write(decoded)
driver.quit()