Here's a different approach you could explore. I did this in XSLT 1.0, but the differences are not essential to the method.
The basic idea is to attach the id of the parent para to each reference contained by the para. Then, using Muenchian grouping, we leave only the first occurrence of each reference. And since each of these retains the id of the original parent, we know where it needs to appear in the final output.
Note that it is assumed there are no independent reference elements (i.e elements that are not referenced in at least one para).
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="tokens" match="token" use="." />
<xsl:key name="ref" match="div[@class='figure-wrapper']" use="@id" />
<xsl:variable name="root" select="/"/>
<!-- 1. collect all references, along with their parent id -->
<xsl:variable name="references">
<xsl:for-each select="//p[@class='para']">
<xsl:call-template name="cat_ref">
<xsl:with-param name="string" select="."/>
<xsl:with-param name="pid" select="generate-id()"/>
</xsl:call-template>
</xsl:for-each>
</xsl:variable>
<!-- 2. keep only unique references -->
<xsl:variable name="unique-ref" select="exsl:node-set($references)/token[count(. | key('tokens', .)[1]) = 1]"/>
<!-- 3. output -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[@class='para']">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
<!-- append my references -->
<xsl:for-each select="$unique-ref[@pid=generate-id(current())]">
<xsl:variable name="ref-key" select="."/>
<!-- switch back to document in order to use key -->
<xsl:for-each select="$root">
<xsl:copy-of select="key('ref', $ref-key)"/>
</xsl:for-each>
</xsl:for-each>
</xsl:template>
<!-- suppress references -->
<xsl:template match="div [@class='figure-wrapper']"/>
<!-- proc template -->
<xsl:template name="cat_ref">
<xsl:param name="string"/>
<xsl:param name="pid"/>
<xsl:param name="prefix" select="'(see Fig. '" />
<xsl:param name="suffix" select="')'" />
<xsl:if test="contains($string, $prefix) and contains(substring-after($string, $prefix), $suffix)">
<token pid="{$pid}">
<xsl:text>figure</xsl:text>
<xsl:value-of select="substring-before(substring-after($string, $prefix), $suffix)" />
</token>
<!-- recursive call -->
<xsl:call-template name="cat_ref">
<xsl:with-param name="string" select="substring-after(substring-after($string, $prefix), $suffix)" />
<xsl:with-param name="pid" select="$pid" />
</xsl:call-template>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Applied to your input, the following result is obtained:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head>
<title>...</title>
</head>
<body>
<p class="para">Lorem Ipsum (see Fig. 1). Lorem Ipsum (see Fig. 2).</p>
<div class="figure-wrapper" id="figure1">...</div>
<div class="figure-wrapper" id="figure2">...</div>
<p class="para">Lorem Ipsum (see Fig. 3). Lorem Ipsum (see Fig. 1).</p>
<div class="figure-wrapper" id="figure3">...</div>
</body>
</html>