I have tried to tackle this with XSLT 2.0 and for-each-group
but I had difficulties finding a grouping expression, I always needed/wanted to compute the string-length for the following element and I don't know of a way in XSLT 2.0 to do that. So I looked at other options and XQuery 3.0 with its window
feature allows that.
Using Saxon 9.5 PE and the XQuery
xquery version "3.0";
declare variable $size as xs:integer external := 200;
declare function local:pair($element) {
($element, $element/following-sibling::*[1])
};
let $start-elements := //title-en | //p-en | //li-en
let $elements := $start-elements | //title-es | //p-es | //li-es
for tumbling window $table in $start-elements
start $start when true()
end $end next $enext when
sum(
(local:pair($start)/string-length(),
$elements[$start << .
and . << $enext]/string-length(),
local:pair($enext)/string-length())) gt $size
return <table>
{ for $el in $table
return <tr>
{
for $pair in local:pair($el)
return <td class="{local-name($pair/..)}">{$pair}</td>
}
</tr>
}
</table>
with your sample input I get the result
<?xml version="1.0" encoding="UTF-8"?>
<table>
<tr>
<td class="title">
<title-en>Document title in english</title-en>
</td>
<td class="title">
<title-es>Título del documento en español</title-es>
</td>
</tr>
<tr>
<td class="title">
<title-en>Section 1 title in english</title-en>
</td>
<td class="title">
<title-es>Título de la sección 1 en español</title-es>
</td>
</tr>
<tr>
<td class="p">
<p-en>Some text 1,<br/>more text</p-en>
</td>
<td class="p">
<p-es>Texto 1,<br/>más texto</p-es>
</td>
</tr>
</table>
<table>
<tr>
<td class="li">
<li-en>List text 1. See section <a href="2">2</a>
</li-en>
</td>
<td class="li">
<li-es>Texto de lista 1. Ver sección <a href="2">2</a>
</li-es>
</td>
</tr>
<tr>
<td class="li">
<li-en>List text 2</li-en>
</td>
<td class="li">
<li-es>Texto de lista 2</li-es>
</td>
</tr>
<tr>
<td class="li">
<li-en>List text 3</li-en>
</td>
<td class="li">
<li-es>Texto de lista 3</li-es>
</td>
</tr>
<tr>
<td class="p">
<p-en>Some text 2.</p-en>
</td>
<td class="p">
<p-es>Texto 2.</p-es>
</td>
</tr>
<tr>
<td class="p">
<p-en>Some text 3.</p-en>
</td>
<td class="p">
<p-es>Texto 3.</p-es>
</td>
</tr>
</table>
<table>
<tr>
<td class="title">
<title-en>Section 2 title in english</title-en>
</td>
<td class="title">
<title-es>Título de la sección 2 en español</title-es>
</td>
</tr>
<tr>
<td class="p">
<p-en>Some text 4. <b>Bold text</b>
</p-en>
</td>
<td class="p">
<p-es>Texto 4. <b>Texto en negrita</b>
</p-es>
</td>
</tr>
<tr>
<td class="p">
<p-en>Some text 5.</p-en>
</td>
<td class="p">
<p-es>Texto 5.</p-es>
</td>
</tr>
</table>
which I think has the structure you want. There is fine-tuning left to get the right class
attributes for instance but let us first know whether XQuery 3.0 like provided by Saxon PE or EE or other XQuery engines is an option for you.