Pergunta

I'm new to XSLT and I need to convert a geographical metadata XML file from one standard version (v3) to a newer one (v5).

There are several aspects to consider:

  • Elements name are similar, but not identical, in v3 and v5.
  • There are some elements on v5 that are not present in v3 and those have to be added anyway sometimes without values and sometimes with predefined attributes and values.
  • On XML v5 all the elements must have a NS prefix such as <gmd:...>, <gco:...> that are not present in all the v3 elements.
  • On my XML v5 output file I don't want the xmlns attribute on each element because it needs to be declared only on the top of the file (see result example).
  • XML v3 input file has some kind of "legal value" so they must be took as-is.

All files have hundreds of lines so I posted here only portions of them.

This is a portion of a input file (XML v3):

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="c:\ISO19139_rve.xsl"?>
<MD_Metadata xmlns="http://www.isotc211.org/schemas/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gco="http://www.isotc211.org/schemas/2005/gco" xmlns:gml="http://www.opengis.net/gml" xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="http://www.isotc211.org/schemas/2005/gmd/gmd.xsd">
  <fileIdentifier>
    <gco:CharacterString>b0101011_Vincolo</gco:CharacterString>
  </fileIdentifier>
  <language>
    <gco:CharacterString>IT</gco:CharacterString>
  </language>
  <contact>
    <CI_ResponsibleParty>
      <organizationName>
        <gco:CharacterString>Comune di Conselve (capofila PATI)</gco:CharacterString>
      </organizationName>
      <role>
        <CI_RoleCode codeList="./resource/codeList.xml#CI_RoleCode" codeListValue="Autore">Autore</CI_RoleCode>
      </role>
      <contactInfo>
        <CI_Contact>
          <onlineResource>
            <CI_OnlineResource>
              <linkage>
                <URL>http://www.comune.conselve.it</URL>
              </linkage>
            </CI_OnlineResource>
          </onlineResource>
          <phone>
            <CI_Telephone>
              <voice>
                <gco:CharacterString>0499596511</gco:CharacterString>
              </voice>
            </CI_Telephone>
          </phone>
        </CI_Contact>
      </contactInfo>
    </CI_ResponsibleParty>
  </contact>
  <dateStamp>
    <gco:Date>2007-12-13</gco:Date>
  </dateStamp>
  <metadataStandardName>
    <gco:CharacterString>ISO 19115 (UNI EN ISO 19115) Repertorio Nazionale dei Dati Territoriali - Linee guida per l'applicazione dello standard ISO 19115</gco:CharacterString>
  </metadataStandardName>
  <metadataStandardVersion>
    <gco:CharacterString>2006 (v.0.3)</gco:CharacterString>
  </metadataStandardVersion>
  <identificationInfo>
    <MD_DataIdentification>
      <citation>
        <CI_Citation>
          <title>
            <gco:CharacterString>Ambiti sottoposti a regime di vincolo</gco:CharacterString>
          </title>
          <date>
            <CI_Date>
              <date>
                <gco:CharacterString>2007-12-13</gco:CharacterString>
              </date>
              <dateType>
                <CI_DateTypeCode codeList="./resource/codeList.xml#CI_DateTypeCode" codeListValue="Creazione">Creazione</CI_DateTypeCode>
              </dateType>
            </CI_Date>
          </date>
          <identifier>
            <MD_Identifier>
              <code>
                <gco:CharacterString>b0101011_Vincolo.shp</gco:CharacterString>
              </code>
            </MD_Identifier>
          </identifier>
        </CI_Citation>
      </citation>
    </MD_DataIdentification>
  </identificationInfo>
</MD_Metadata>

This is the result structure I need (XML v5 structure):

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="C:\_stile metadati RVEN\xsl\Dataset - RNDT.xsl"?>
<gmd:MD_Metadata xsi:schemaLocation="http://www.isotc211.org/2005/gmd http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/gmd/gmd.xsd" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

  <gmd:fileIdentifier>
    <gco:CharacterString>arpa_ve:c0408090_EQBfiumi</gco:CharacterString>
  </gmd:fileIdentifier>

  <gmd:language>
    <gmd:LanguageCode codeList="./resource/codeList.xml#LanguageCode" codeListValue="ita">ita</gmd:LanguageCode>
  </gmd:language>

  <gmd:characterSet>
    <gmd:MD_CharacterSetCode codeList="./resource/codeList.xml#MD_CharacterSetCode" codeListValue="utf8">utf8</gmd:MD_CharacterSetCode>
  </gmd:characterSet>

  <gmd:parentIdentifier>
    <gco:CharacterString>arpa_ve:c0408090_EQBfiumi</gco:CharacterString>
  </gmd:parentIdentifier>

  <gmd:hierarchyLevel>
    <gmd:MD_ScopeCode codeList="./resource/codeList.xml#MD_ScopeCode" codeListValue="dataset">Dataset</gmd:MD_ScopeCode>
  </gmd:hierarchyLevel>

  <gmd:contact>
    <gmd:CI_ResponsibleParty>
      <gmd:organisationName>
        <gco:CharacterString>ARPAV - Servizio Osservatorio Acque Interne</gco:CharacterString>
      </gmd:organisationName>

      <gmd:contactInfo>
        <gmd:CI_Contact>
          <gmd:phone>
            <gmd:CI_Telephone>
              <gmd:voice>
                <gco:CharacterString>+39 049 7393783</gco:CharacterString>
              </gmd:voice>
            </gmd:CI_Telephone>
          </gmd:phone>
          <gmd:address>
            <gmd:CI_Address>
              <gmd:electronicMailAddress>
                <gco:CharacterString>orac@arpa.veneto.it</gco:CharacterString>
              </gmd:electronicMailAddress>
            </gmd:CI_Address>
          </gmd:address>
          <gmd:onlineResource>
            <gmd:CI_OnlineResource>
              <gmd:linkage>
                <gmd:URL>http://www.arpa.veneto.it</gmd:URL>
              </gmd:linkage>
            </gmd:CI_OnlineResource>
          </gmd:onlineResource>
        </gmd:CI_Contact>
      </gmd:contactInfo>

      <gmd:role>
        <gmd:CI_RoleCode codeList="./resource/codeList.xml#CI_RoleCode" codeListValue="pointOfContact">Punto di contatto</gmd:CI_RoleCode>
      </gmd:role>
    </gmd:CI_ResponsibleParty>
  </gmd:contact>

  <gmd:dateStamp>
    <gco:Date>2013-10-10</gco:Date>
  </gmd:dateStamp>

  <gmd:metadataStandardName>
    <gco:CharacterString>DM - Regole tecniche RNDT</gco:CharacterString>
  </gmd:metadataStandardName>

  <gmd:metadataStandardVersion>
    <gco:CharacterString>10 novembre 2011</gco:CharacterString>
  </gmd:metadataStandardVersion>

  <gmd:referenceSystemInfo>
    <gmd:MD_ReferenceSystem>
      <gmd:referenceSystemIdentifier>
        <gmd:RS_Identifier>
          <gmd:code>
            <gco:CharacterString>ROMA40/OVEST</gco:CharacterString>
          </gmd:code>
        </gmd:RS_Identifier>
      </gmd:referenceSystemIdentifier>
    </gmd:MD_ReferenceSystem>
  </gmd:referenceSystemInfo>


  <gmd:identificationInfo>
    <gmd:MD_DataIdentification>
      <gmd:citation>
        <gmd:CI_Citation>
          <gmd:title>
            <gco:CharacterString>EQB - Elementi di Qualità Biologica dei fiumi</gco:CharacterString>
          </gmd:title>
          <gmd:date>
            <gmd:CI_Date>
              <gmd:date>
                <gco:Date>2013-10-10</gco:Date>
              </gmd:date>
              <gmd:dateType>
                <gmd:CI_DateTypeCode codeList="./resource/codeList.xml#CI_DateTypeCode" codeListValue="creation">Creazione</gmd:CI_DateTypeCode>
              </gmd:dateType>
            </gmd:CI_Date>
          </gmd:date>
          <gmd:identifier>
            <gmd:RS_Identifier>
              <gmd:code>
                <gco:CharacterString>arpa_ve:c0408090_EQBfiumi</gco:CharacterString>
              </gmd:code>
            </gmd:RS_Identifier>
          </gmd:identifier>
        </gmd:CI_Citation>
      </gmd:citation>
    </gmd:MD_DataIdentification>
  </gmd:identificationInfo>
</gmd:MD_Metadata>

This is my actual (only partially completed) XSL transformation:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:gmd="http://www.isotc211.org/schemas/2005/gmd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gco="http://www.isotc211.org/schemas/2005/gco">
<xsl:output method="xml" indent="yes"/>
<!--Content:template xmlns:gmd="http://www.isotc211.org/2005/gmd"> -->
<!-- <?xml version="1.0"?> -->
<!-- <?xml-stylesheet type="text/xsl" href="C:\_stile metadati RVEN\xsl\Dataset - RNDT.xsl"?> -->
<!-- <xsl:element name="gmd:MD_Metadata" namespace="xsi:schemaLocation='http://www.isotc211.org/2005/gmd http://standards.iso.org/ittf/PubliclyAvailableStandards/ISO_19139_Schemas/gmd/gmd.xsd' xmlns:gco='http://www.isotc211.org/2005/gco' xmlns:gml='http://www.opengis.net/gml/3.2' xmlns:gmd='http://www.isotc211.org/2005/gmd' xmlns:xlink='http://www.w3.org/1999/xlink' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'"> -->

<xsl:template match="/">
   <xsl:element name="gmd:MD_Metadata">
      <xsl:element name="fileIdentifier">
         <xsl:element name="gco:CharacterString">
            <xsl:for-each select="//gmd:fileIdentifier">
               <xsl:value-of select="gco:CharacterString"/>
            </xsl:for-each>
         </xsl:element>
      </xsl:element>

      <xsl:element name="gmd:language">
       <xsl:element name="gmd:LanguageCode">
          <xsl:attribute name="codeList">./resource/codeList.xml#LanguageCode</xsl:attribute>
          <xsl:attribute name="codeListValue">
             <!-- <xsl:for-each select="gmd:MD_Metadata/gmd:language"> -->
             <xsl:for-each select="/gmd:MD_Metadata/gmd:language">
              <xsl:value-of select="gco:CharacterString"/>
           </xsl:for-each>
          </xsl:attribute>
     </xsl:element>
     <xsl:for-each select="/gmd:MD_Metadata/gmd:language">
        <xsl:value-of select="gco:CharacterString"/>
     </xsl:for-each>
    </xsl:element>


    <xsl:element name="gmd:characterSet">
       <xsl:element name="gmd:MD_CharacterSetCode">
        <xsl:attribute name="codeList">./resource/codeList.xml#MD_CharacterSetCode</xsl:attribute>
        <xsl:attribute name="codeListValue">utf8</xsl:attribute>
     </xsl:element>
    </xsl:element>

    <xsl:element name="gmd:parentIdentifier">
         <xsl:element name="gco:CharacterString"></xsl:element>
      </xsl:element>

    <xsl:element name="gmd:hierarchyLevel">
       <xsl:element name="gmd:MD_ScopeCode">
        <xsl:attribute name="codeList">./resource/codeList.xml#MD_ScopeCode</xsl:attribute>
        <xsl:attribute name="codeListValue"></xsl:attribute>
     </xsl:element>
    </xsl:element>

    <xsl:element name="gmd:contact">
       <xsl:element name="gmd:CI_ResponsibleParty">
      <xsl:element name="gmd:organisationName">
        <xsl:element name="gco:CharacterString">
           <xsl:for-each select="gmd:MD_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:organizationName"> <!-- [3] -->
                  <xsl:value-of select="gco:CharacterString"/>
               </xsl:for-each>
            </xsl:element>
      </xsl:element>
      <xsl:element name="gmd:contactInfo">
        <xsl:element name="gmd:CI_Contact">
          <xsl:element name="gmd:phone">
            <xsl:element name="gmd:CI_Telephone">
              <xsl:element name="gmd:voice">
                <xsl:element name="gco:CharacterString">
                </xsl:element>
              </xsl:element>
            </xsl:element>
          </xsl:element>
          <xsl:element name="gmd:address">
            <xsl:element name="gmd:CI_Address">
              <xsl:element name="gmd:electronicMailAddress">
                <xsl:element name="gco:CharacterString">
                   <xsl:for-each select="gmd:MD_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:phone/gmd:address/gmd:CI_Address/gmd:electronicMailAddress">
                                  <xsl:value-of select="gco:CharacterString"/>
                               </xsl:for-each>
                </xsl:element>
              </xsl:element>
            </xsl:element>
          </xsl:element>
          <xsl:element name="gmd:onlineResource">
            <xsl:element name="gmd:CI_OnlineResource">
              <xsl:element name="gmd:linkage">
                <xsl:element name="gmd:URL">
                   <xsl:for-each select="gmd:MD_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:phone/gmd:onlineResource/gmd:linkage">
                                  <xsl:value-of select="gco:CharacterString"/>
                               </xsl:for-each>
                </xsl:element>
              </xsl:element>
            </xsl:element>
          </xsl:element>
        </xsl:element>
      </xsl:element>
      <xsl:element name="gmd:role">
        <xsl:element name="gmd:CI_RoleCode">
           <xsl:attribute name="codeList">./resource/codeList.xml#CI_RoleCode</xsl:attribute>
           <xsl:attribute name="codeListValue">
              <xsl:for-each select="gmd:MD_Metadata/gmd:contact/gmd:CI_ResponsibleParty/gmd:role/gmd:CI_RoleCode">
                     <xsl:value-of select="gco:CharacterString"/>
                  </xsl:for-each>
           </xsl:attribute>
          </xsl:element>
        </xsl:element>
       </xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

And this is the result I obtain applying my transformation (only a few lines):

<?xml version="1.0" encoding="UTF-8"?>
<gmd:MD_Metadata xmlns:gmd="http://www.isotc211.org/schemas/2005/gmd">
   <fileIdentifier>
      <gco:CharacterString xmlns:gco="http://www.isotc211.org/schemas/2005/gco">b0101011_Vincolo</gco:CharacterString>
   </fileIdentifier>
   <gmd:language>
      <gmd:LanguageCode codeList="./resource/codeList.xml#LanguageCode" codeListValue="IT"/>IT</gmd:language>
   <gmd:characterSet>
      <gmd:MD_CharacterSetCode codeList="./resource/codeList.xml#MD_CharacterSetCode"
                               codeListValue="utf8"/>
   </gmd:characterSet>
   <gmd:parentIdentifier>
      <gco:CharacterString xmlns:gco="http://www.isotc211.org/schemas/2005/gco"/>
   </gmd:parentIdentifier>
   <gmd:hierarchyLevel>
      <gmd:MD_ScopeCode codeList="./resource/codeList.xml#MD_ScopeCode" codeListValue=""/>
   </gmd:hierarchyLevel>
   <gmd:contact>
      <gmd:CI_ResponsibleParty>
         <gmd:organisationName>
            <gco:CharacterString xmlns:gco="http://www.isotc211.org/schemas/2005/gco">Comune di Conselve (capofila PATI)</gco:CharacterString>
         </gmd:organisationName>
         <gmd:contactInfo>
            <gmd:CI_Contact>
               <gmd:phone>
                  <gmd:CI_Telephone>
                     <gmd:voice>
                        <gco:CharacterString xmlns:gco="http://www.isotc211.org/schemas/2005/gco"/>
                     </gmd:voice>
                  </gmd:CI_Telephone>
               </gmd:phone>
               <gmd:address>
                  <gmd:CI_Address>
                     <gmd:electronicMailAddress>
                        <gco:CharacterString xmlns:gco="http://www.isotc211.org/schemas/2005/gco"/>
                     </gmd:electronicMailAddress>
                  </gmd:CI_Address>
               </gmd:address>
               <gmd:onlineResource>
                  <gmd:CI_OnlineResource>
                     <gmd:linkage>
                        <gmd:URL/>
                     </gmd:linkage>
                  </gmd:CI_OnlineResource>
               </gmd:onlineResource>
            </gmd:CI_Contact>
         </gmd:contactInfo>
         <gmd:role>
            <gmd:CI_RoleCode codeList="./resource/codeList.xml#CI_RoleCode" codeListValue=""/>
         </gmd:role>
      </gmd:CI_ResponsibleParty>
   </gmd:contact>
</gmd:MD_Metadata>

As you can see I've the xmlns:gxx="http://... attribute repeated on each element and I don't want it.

Also, I know that my XSL approach is not the best. So if there's a better way (and I think so) to do this transformation with less coding and possibly less debug and maintenance efforts needed, I obviously will appreciate any suggestion!

For those who want to see the complete files there's this ZIP archive that contains the 3 XML and my XSL.

Foi útil?

Solução

Your XSLT approach is way too complicated. XSLT is verbose, but not that verbose.

A few hints.

  • Use proper namespace declarations. If most of your output needs to be in a certain namespace, put your stylesheet into that namespace by default. Declare:

    xmlns="http://www.isotc211.org/2005/gmd"
    

    at the top level of your XSLT and your entire output will be in that namespace automatically except for nodes you declare otherwise.

  • Don't use <xsl:element> if you want to create elements with a fixed name. Just write out the elements you want to create.

    <xsl:element name="gmd:MD_Metadata">
    </xsl:element>
    

    is equivalent to:

    <MD_Metadata>
    </MD_Metadata>
    

    if you used a default namespace like indicated above.

  • The same goes for <xsl:attribute>. Just write it out.

    <xsl:element name="gmd:LanguageCode">
      <xsl:attribute name="codeList">./resource/codeList.xml#LanguageCode</xsl:attribute>
    </xsl:element>
    

    is

    <LanguageCode codeList="./resource/codeList.xml#LanguageCode">
    </LanguageCode>
    
  • Use attribute value templates to fill attributes with calculated values:

    <LanguageCode codeListValue="{.}" />
    

    note the curly braces, into which you can put any XPath expression.

  • Don't use <xsl:for-each> at all. Write templates and use <xsl:apply-templates>. Try to write short templates instead of cramming everything into one big <xsl:template match="/">.


This entire use-case is perfect for a "pull-style" approach.

  • You don't really want to change much of your input. From all I can see, differences between v3 and v5 are marginal.
  • Base your stylesheet on the identity template, handle only those cases where you actually need to make a change to the input.
  • Think: "If I find such a node is in the input, how should the output look like?".
  • Write a template accordingly.
  • Handle all nesting/recursion by calling <xsl:apply-templates>.
  • It helps thinking your of input XML flowing through the stylesheet, like through a sieve. You don't make your stylesheet do stuff, you only set up case handlers.

This stylesheet is a good start:

<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xlink="http://www.w3.org/1999/xlink" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
  xmlns:gml="http://www.opengis.net/gml/3.2" 
  xmlns:gco="http://www.isotc211.org/schemas/2005/gco"
  xmlns:gmd="http://www.isotc211.org/schemas/2005/gmd"
  xmlns="http://www.isotc211.org/schemas/2005/gmd"
>
  <xsl:output indent="yes" encoding="utf-8" />

  <!-- default: all input nodes are copied as they are (identity template) -->
  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <!-- override: MD_Metadata needs to be rebuilt -->
  <xsl:template match="gmd:MD_Metadata">
    <xsl:copy>
      <xsl:apply-templates select="@*" />

      <!-- handle fileIdentifier and language first -->
      <xsl:apply-templates select="gmd:fileIdentifier | gmd:language" />

      <!-- now a few additions that are not in the source -->
      <characterSet>
        <MD_CharacterSetCode codeList="./resource/codeList.xml#MD_CharacterSetCode" codeListValue="utf8" />
      </characterSet>

      <parentIdentifier>
        <gco:CharacterString />
      </parentIdentifier>

      <hierarchyLevel>
        <MD_ScopeCode codeList="./resource/codeList.xml#MD_ScopeCode" codeListValue="" />
      </hierarchyLevel>

      <!-- now handle the rest of the contents -->
      <xsl:apply-templates select="node()[not(
        self::gmd:fileIdentifier or 
        self::gmd:language
      )]" />
    </xsl:copy>
  </xsl:template>

  <!-- override: the CharacterString beneath language becomes a LanguageCode -->
  <xsl:template match="gmd:language/gco:CharacterString">
    <LanguageCode codeList="./resource/codeList.xml#LanguageCode" codeListValue="{.}">
      <xsl:value-of select="." />
    </LanguageCode>
  </xsl:template>

  <!-- add more overrides... -->
</xsl:stylesheet>

Note how the entire XSLT above does not concern itself with iteration or nesting. <xsl:apply-templates> always redirects to the identity template (or an override template you supplied). This way you need to build specific templates only for stuff that needs to change on its way from input to output.

The output looks like this:

<MD_Metadata xsi:schemaLocation="http://www.isotc211.org/schemas/2005/gmd/gmd.xsd" xmlns="http://www.isotc211.org/schemas/2005/gmd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gco="http://www.isotc211.org/schemas/2005/gco" xmlns:gml="http://www.opengis.net/gml" xmlns:xlink="http://www.w3.org/1999/xlink">
  <fileIdentifier>
    <gco:CharacterString>b0101011_Vincolo</gco:CharacterString>
  </fileIdentifier>

  <!-- ... and so on -->
</MD_Metadata>

If necessary, use empty templates to suppress unwanted output:

<xsl:template match="gmd:foo" />
<xsl:template match="gmd:bar" />

If you have a definitive list of changes from v3 to v5 you also have a list of override templates to create.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top