I am just trying to scrap something from a website.
I am running into a problem when trying to access the value of a property that is on an element. The value returns CDPJSHandle {}
instead of the object that I expect.
import * as puppeteer from "puppeteer";
const main = async () => {
const browser = await puppeteer.launch({
headless: false,
});
const page = await browser.newPage();
await page.goto('https://tickets.union-zeughaus.de/unveu/venue/Veranstaltungen2/534be454-f064-42f4-a434-8530d8ad4a48');
const element = await page.$('canvas');
const properties = await element.getProperties();
const [firstKey] = await properties.keys();
const propertyValue = await element.getProperty(firstKey);
console.log({
element, // CDPElementHandle { handle: CDPJSHandle {} },
properties, // Map(1) { 'jQuery3700346878389459100542' => CDPJSHandle {} },
firstKey, // 'jQuery3700346878389459100542',
propertyValue, // CDPJSHandle {}
});
await browser.close();
}
main();
This jQuery3700346878389459100542
is a custom property attached to this html element. I can access the property via the browser with document.querySelector('canvas').jQuery370058052487285485872;
which returns the object that I expect.
I know that a similar question has been asked here, which will work if I’m referencing a native HTML element property. However my scenario differs as I’m looking for a custom property.
const evaluatedValue = await element?.evaluate((el, firstKey) => el[firstKey], firstKey);
const evaluatedTextValue = await element.evaluate(el => el.id);
console.log({
evaluatedValue, // undefined
evaluatedTextValue, // 'n '
});
How can I access this property correctly?
3
I’m not sure what data you’re ultimately trying to get, so this might be an XY problem because there’s likely a more straightforward way to solve your real problem, but here’s how you can extract some of the data from that dynamic circular object:
import puppeteer from "puppeteer"; // ^22.7.1
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForFunction(`
Object.keys(
document.querySelector("canvas")
).some(k => k.startsWith("jQuery"))
`);
const data = await page.$eval("canvas", el => {
const {isEventSeries, needsDateAndTime, src} =
Object.entries(el).find(([k, v]) =>
k.startsWith("jQuery")
)[1];
return {isEventSeries, needsDateAndTime, src};
});
console.log(data);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Output:
{
isEventSeries: 'False',
needsDateAndTime: 'False',
src: 'Veranstaltungen2/534be454-f064-42f4-a434-8530d8ad4a48'
}
Avoid element handles whenever possible, they’re nasty to work with. Just extract the data with $eval
, $$eval
, or evaluate
.
page.$()
doesn’t auto wait, which is necessary here since the property is added to the canvas dynamically, so that’s mostly off the table, at least if you’re going to use the fastest {waitUntil: "domcontentloaded"}
predicate on the goto
, which you should (linked blog post is mine).