I have an HTML page, a small fragment of interest being:
<ul>
<li class="nd-list__item in-searchLayoutListItem">
<div class="in-listingCard" id="112293455">
<div class="nd-mediaObject nd-mediaObject--colToRow in-listingCardProperty" role="button" tabindex="0">
<div class="in-listingCardPropertyMedia">
...
</div>
<div class="nd-mediaObject__content in-listingCardPropertyContent">
<div class="nd-figure in-listingCardAgencyLogo">
...
</div>
<div class="in-listingCardPrice">
<span>€ 6.300.000</span>
</div>
<a href="https://.../112293455/" title="Villa unifamiliare via delle Forre Torre 8, Fucecchio" class="in-listingCardTitle">Villa unifamiliare via delle Forre Torre 8, Fucecchio</a>
<div class="in-listingCardFeatureList has-lowVisibility">
<div class="in-listingCardFeatureList__item">
<svg viewBox="0 0 24 24" class="nd-icon in-listingCardFeatureList__icon">
<use class="nd-icon__use" xlink:href="#planimetry"></use>
</svg>
<span>5+ locali</span>
</div>
<div class="in-listingCardFeatureList__item">
<svg viewBox="0 0 24 24" class="nd-icon in-listingCardFeatureList__icon">
<use class="nd-icon__use" xlink:href="#size"></use>
</svg>
<span>1.600 m²</span>
</div>
<div>
...
</div>
</div>
</div>
</div>
</div>
</li>
<li>
...
</li>
</ul>
I’m using xidel
for extracting data in the form of a list of map
s with the following XPath expression:
//li//div[contains-token(@class,"in-listingCardPropertyContent")] ! (
let
$announcementPriceNode := a[contains-token(@class,"in-listingCardPrice")],
$announcementTitleNode := div[contains-token(@class,"in-listingCardTitle")],
$announcementFeaturesMap := map:merge(
div[contains-token(@class,"in-listingCardFeatureList")]/
div[contains-token(@class,"in-listingCardFeatureList__item")] ! (
let
$key := svg/use/@href => replace("^#",""),
$val := normalize-space()
return
map{$key:$val}
)
)
return
map{
"price" : tokenize($announcementPriceNode)[last()] || " €",
"link" : string($announcementTitleNode/@href),
"description": string($announcementTitleNode/@title),
"features" : $announcementFeaturesMap
}
)
But I get:
{
"price": " €",
"link": "",
"description": "",
"features": {
"planimetry": "5+ locali",
"size": "1.600 m²"
}
}
Instead of:
{
"price": "6.300.000 €",
"link": "https://.../112293455/",
"description": "Villa unifamiliare via delle Forre Torre 8, Fucecchio",
"features": {
"planimetry": "5+ locali",
"size": "1.600 m²"
}
}
What’s going on?