I am scraping and parsing the content of web page (https://www.mydealz.de/new). Structure is like follows.
<div class="threadGrid-title">
<strong><a href="">title</a></strong>
<span class=" overflow--fade">
<span class="overflow--wrap-off flex boxAlign-ai--all-bl">
<span class="vAlign--all-tt">
<span
class="threadItemCard-price text--b thread-price size--all-l size--fromW3-xl space--mr-0">**price**</span>
</span>
<span class="mute--text text--lineThrough space--ml-1 size--all-l size--fromW3-xl ">**old
price**</span>
<span class="text--color-charcoal space--ml-1 size--all-l size--fromW3-xl">**Discount%**</span>
</span>
</span>
</div>
I have been able to get the title but for other things and elements, it’s giving me strange and unexpected things.
The code is below. I am using cheerio.
async function checkDeals() {
try {
const response = await axios.get(baseUrl);
const $ = cheerio.load(response.data);
const deals = [];
// Iterate over each deal
$('.thread--type-list').each((index, element) => {
const threadTitleElement = $(element).find('.threadGrid-title');
// Print all child elements of .threadGrid-title
console.log('Child elements of .threadGrid-title:');
threadTitleElement.each((i, el) => {
$(el).find('*').each((j, child) => {
console.log($.html(child)); // Print HTML of each nested element
});
});
// Attempt to extract deeply nested price and discount
const priceElement = $(element).find('.threadItemCard-price');
const discountElement = $(element).find('.text--color-charcoal');
// Extracting the price and discount with more detailed text processing
const price = priceElement.map((i, el) => $(el).text().trim()).get().join(' ');
const discount = discountElement.map((i, el) => $(el).text().trim()).get().join(' ');
console.log('Price:', price);
console.log('Discount:', discount);
});
} catch (error) {
console.error('Error fetching the deals:', error);
}
}
Output Screenshot:
enter image description here
Output:
Child elements of .threadGrid-title:
<strong class="thread-title "><a class="cept-tt thread-link linkPlain thread-title--list js-thread-title" title="(Amazon Prime) Victorinox Universalschäler" href="https://www.mydealz.de/deals/amazon-prime-victorinox-universalschaler-2393896" data-t="threadLink" data-t-click="">(Amazon Prime) Victorinox Universalschäler</a></strong>
<a class="cept-tt thread-link linkPlain thread-title--list js-thread-title" title="(Amazon Prime) Victorinox Universalschäler" href="https://www.mydealz.de/deals/amazon-prime-victorinox-universalschaler-2393896" data-t="threadLink" data-t-click="">(Amazon Prime) Victorinox Universalschäler</a><span class="overflow--fade"><div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"ThreadPriceListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"></div></div><span class="thread-divider"></span><div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"MerchantLabelThreadListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"></div></div></span>
<div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"ThreadPriceListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"></div></div>
<div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"></div>
<img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3">
<img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3">
<span class="thread-divider"></span>
<div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"MerchantLabelThreadListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"></div></div>
<div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"></div>
<img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px">
Price:
Discount:
I want to get the price, old price and discount.
Muhammad Kazim is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
The most classic Axios/Cheerio mistake is assuming they execute JS, or that the server will always serve the same thing it does to your browser, or that what you see in the dev tools console reflects what Axios fetches and what Cheerio will parse.
In fact, if you’re lucky and the server doesn’t block you or send a different HTML document, all you get is the view-source:
version of the page from Axios. No JS is executed and single page apps that load data with JS aren’t hydrated. Log the HTML returned by Axios to see what you’re really working with. Sometimes setting certain headers like cookies or the user-agent can change what’s returned.
Sometimes, though, the data you want is available in that initial load, just not necessarily where you expect. In this case, the data is conveniently in JSON blobs in [data-vue2]
tags, which the JS presumably hydrates into HTML elements after the page load:
const axios = require("axios"); // ^1.6.8
const cheerio = require("cheerio"); // ^1.0.0-rc.12
const url = "<Your URL>";
axios.get(url)
.then(({data: html}) => {
const $ = cheerio.load(html);
const data = [...$("[data-vue2]")]
.map(e => $(e).data("vue2"))
.filter(e => e.name === "ThreadMainListItemNormalizer")
.map(e => e.props.thread);
console.log(JSON.stringify(data, null, 2));
})
.catch(err => console.error(err));