I am using the following Python code to capture a web page as a PDF. The page in question (https://www.haywood.edu/about-hcc/college-leadership.php) looks like the following, after you click on the word “Accreditation”. The page lands on “Our President” by default.
I am using Playwright to click on the Accreditation text to switch to that part of the page. This works and captures the page but cuts off the top.
This is what it looks like if I just capture the “Our President” page:
Here is the code:
import asyncio
from playwright.async_api import async_playwright
async def generate_pdf(url, path, selector):
async with async_playwright() as p:
browser_type = p.chromium
browser = await browser_type.launch()
page = await browser.new_page()
await page.goto(url)
if (selector == "Accreditation"):
await page.get_by_text(selector, exact=True).click()
accred = page.locator("#tab-5-d26e139")
await accred.wait_for()
await page.pdf(path=path,
display_header_footer=True,
header_template="<div></div>",
footer_template="""
<div style="width: 100%; text-align: center; font-size: 10px;">
[<span class="url"></span>] captured on <span class="date"></span>
</div>
""",
margin={"top": "40px", "bottom": "1in"},
print_background=True)
await browser.close()
asyncio.run(generate_pdf("https://www.haywood.edu/about-hcc/college-leadership.php", "SACSCOC Accreditation Page.pdf", "Accreditation"))
asyncio.run(generate_pdf("https://www.haywood.edu/about-hcc/college-leadership.php", "Our President.pdf", ""))
print("Done!")
Any idea how I get it to capture the top part of the page?