I have an HTML table with <td>
elements that can span multiple rows:
<table border="1">
<tbody>
<tr>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
</tr>
<tr>
<th>1</th>
<td rowspan="2">B12</td>
<td>C1</td>
<td rowspan="3">D123</td>
</tr>
<tr>
<th>2</th>
<td rowspan="2">C23</td>
</tr>
<tr>
<th>3</th>
<td>B3</td>
</tr>
</tbody>
</table>
And I would like to generate a tabular format from it:
A B C D
A1 B12 C1 D123
A2 B12 C23 D123
A3 B3 C23 D123
I’m trying to do it with XPath 3.1 (I’m not against a XQuery or XSLT solution but I don’t think they can do any better here):
//table/tbody/(
let
$matx := map:merge((1 to count(tr)) ! map:entry(.,map{}))
return
tr/(
let
$x := position()
return
*[name() = "tr" or name() = "th"]/(
let
$str := string(),
$pos := position(),
$row := $matx($x),
$y := ($pos to 1 + max(($pos, map:keys($row))))
=> filter(function($i){ not(map:contains($row,$i)) })
=> head()
return
($x to $x + (if (@rowspan) then xs:integer(@rowspan) - 1 else 0))
=> fold-left($matx, function($m,$i) {
$m => map:put($i, $m($i) => map:put($y, $str))
})
)
)
)
The main problem is that I can’t update the global $matx
while looping through the <tr>
. I’m currently trying to refactor the code with fold-left
but it’s become a real mess. I hope that there’s a simpler solution.