Produce context data for first and last occurrences of every value of an element
Question
Given the following xml:
<container>
<val>2</val>
<id>1</id>
</container>
<container>
<val>2</val>
<id>2</id>
</container>
<container>
<val>2</val>
<id>3</id>
</container>
<container>
<val>4</val>
<id>1</id>
</container>
<container>
<val>4</val>
<id>2</id>
</container>
<container>
<val>4</val>
<id>3</id>
</container>
I'd like to return something like
2 - 1
2 - 3
4 - 1
4 - 3
Using a nodeset I've been able to get the last occurrence via:
exsl:node-set($list)/container[not(val = following::val)]
but I can't figure out how to get the first one.
Solution
To get the first and the last occurrence (document order) in each "<val>
" group, you can use an <xsl:key>
like this:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="text" />
<xsl:key name="ContainerGroupByVal" match="container" use="val" />
<xsl:variable name="ContainerGroupFirstLast" select="//container[
generate-id() = generate-id(key('ContainerGroupByVal', val)[1])
or
generate-id() = generate-id(key('ContainerGroupByVal', val)[last()])
]" />
<xsl:template match="/">
<xsl:for-each select="$ContainerGroupFirstLast">
<xsl:value-of select="val" />
<xsl:text> - </xsl:text>
<xsl:value-of select="id" />
<xsl:value-of select="' '" /><!-- LF -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
EDIT #1: A bit of an explanation since this might not be obvious right away:
- The
<xsl:key>
returns all<container>
nodes having a given<val>
. You use thekey()
function to query it. - The
<xsl:variable>
is where it all happens. It reads as:- for each of the
<container>
nodes in the document ("//container
") check… - …if it has the same unique id (
generate-id()
) as the first node returned bykey()
or the last node returned bykey()
- where
key('ContainerGroupByVal', val)
returns the set of<container>
nodes matching the current<val>
- if the unique ids match, include the node in the selection
- for each of the
- the
<xsl:for-each>
does the output. It could just as well be a<xsl:apply-templates>
.
EDIT #2: As Dimitre Novatchev rightfully points out in the comments, you should be wary of using the "//
" XPath shorthand. If you can avoid it, by all means, do so — partly because it potentially selects nodes you don't want, and mainly because it is slower than a more specific XPath expression. For example, if your document looks like:
<containers>
<container><!-- ... --></container>
<container><!-- ... --></container>
<container><!-- ... --></container>
</containers>
then you should use "/containers/container
" or "/*/container
" instead of "//container
".
EDIT #3: An alternative syntax of the above would be:
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<xsl:output method="text" />
<xsl:key name="ContainerGroupByVal" match="container" use="val" />
<xsl:variable name="ContainerGroupFirstLast" select="//container[
count(
.
| key('ContainerGroupByVal', val)[1]
| key('ContainerGroupByVal', val)[last()]
) = 2
]" />
<xsl:template match="/">
<xsl:for-each select="$ContainerGroupFirstLast">
<xsl:value-of select="val" />
<xsl:text> - </xsl:text>
<xsl:value-of select="id" />
<xsl:value-of select="' '" /><!-- LF -->
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
Explanation: The XPath union operator "|
" combines it's arguments into a node-set. By definition, a node-set cannot contain duplicate nodes — for example: ". | . | .
" will create a node-set containing exactly one node (the current node).
This means, if we create a union node-set from the current node ("."), the "key(…)[1]
" node and the "key(…)[last()]
" node, it's node count will be 2 if (and only if) the current node equals one of the two other nodes, in all other cases the count will be 3.
OTHER TIPS
Basic XPath:
//container[position() = 1] <- this is the first one
//container[position() = last()] <- this is the last one
Here's a set of XPath functions in more detail.
I. XSLT 1.0
Basically the same solution as the one by Tomalak, but more understandable Also it is complete, so you only need to copy and paste the XML document and the transformation and then just press the "Transform" button of your favourite XSLT IDE:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:key name="kContByVal" match="container"
use="val"/>
<xsl:template match="/*">
<xsl:for-each select=
"container[generate-id()
=
generate-id(key('kContByVal',val)[1])
]
">
<xsl:variable name="vthisvalGroup"
select="key('kContByVal', val)"/>
<xsl:value-of select=
"concat($vthisvalGroup[1]/val,
'-',
$vthisvalGroup[1]/id,
'
'
)
"/>
<xsl:value-of select=
"concat($vthisvalGroup[last()]/val,
'-',
$vthisvalGroup[last()]/id,
'
'
)
"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the originally-provided XML document (edited to be well-formed):
<t>
<container>
<val>2</val>
<id>1</id>
</container>
<container>
<val>2</val>
<id>2</id>
</container>
<container>
<val>2</val>
<id>3</id>
</container>
<container>
<val>4</val>
<id>1</id>
</container>
<container>
<val>4</val>
<id>2</id>
</container>
<container>
<val>4</val>
<id>3</id>
</container>
</t>
the wanted result is produced:
2-1
2-3
4-1
4-3
Do note:
- We use the Muenchian method for grouping to find one
container
element for each set of such elements that have the same value forval
. - From the whole node-list of
container
elements with the sameval
value, we output the required data for the firstcontainer
element in the group and for the lastcontainer
element in the group.
II. XSLT 2.0
This transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:output method="text"/>
<xsl:template match="/*">
<xsl:for-each-group select="container"
group-by="val">
<xsl:for-each select="current-group()[1], current-group()[last()]">
<xsl:value-of select=
"concat(val, '-', id, '
')"/>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
when applied on the same XML document as above, prodices the wanted result:
2-1
2-3
4-1
4-3
Do note:
The use of the
<xsl:for-each-group>
XSLT instruction.The use of the
current-group()
function.