Getting the page content some websites, does not include the dynamic content loaded by js/fonts/images (only gets the initial doc). This problem does not occur in windows headless or headfull mode, neither does this happen in linux headfull mode. After some debugging I have noticed that, the number of requests are less in headless mode. I have also used stealth mode plugin to bypass stealth detection.
sample code I am using
const stealth = StealthPlugin();
puppeteer.use(stealth);
const browser = await puppeteer.launch({
headless: true,
args: [
"--no-sandbox",
"--disable-setuid-sandbox",
"--font-render-hinting=none",
"--disable-gpu",
"--no-first-run",
],
});
const page = await browser.newPage();
page.setDefaultNavigationTimeout(30 * 1000);
let res = await page.goto(config.pageUrl, { waitUntil: "networkidle0" });
let pageContent = await page.content();
fs.writeFileSync("output.html", pageContent);