Question

Alternative title: How to convert CloudMade API response set to a single CSV file?

I have about 1000 XML files containing geocoding responses from the CloudMade API.

As far as I know, CloudMade has no batch API and does not output CSV.

I want to convert the set of XML files into one CSV file containing one row for each response.

Is it possible to do this using just XSLT 1.0? If not, does an XSLT 2.0 solution exist?

The CSV must contain at least three columns: ID, Latitude, and Longitude.

The base name of each XML file contains the response ID.

The Latitude and Longitude elements of the first Array element contain the latitude and longitude values.

Small Example

Here's a small example with just two XML files.

File 140.xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <places>
    <Array pos="0">
      <addressType>housenumber</addressType>
      <city>~Weiz</city>
      <country>Austria</country>
      <featureType>Ortsstrasse</featureType>
      <houseNumber>19</houseNumber>
      <position>
        <lat>47.22148736</lat>
        <lon>15.62440613</lon>
      </position>
      <street>Dr.-Karl-Widdmann-Straße</street>
      <zip>8160</zip>
    </Array>
  </places>
  <status>
    <duration>205</duration>
    <procedure>geo.location.search.2</procedure>
    <success>true</success>
  </status>
</Response>

File 141.xml looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<Response>
  <places>
    <Array pos="0">
      <addressType>housenumber</addressType>
      <city>~Innsbruck</city>
      <country>Austria</country>
      <featureType>Ortsstrasse</featureType>
      <houseNumber>50</houseNumber>
      <position>
        <lat>47.26638083</lat>
        <lon>11.43725792</lon>
      </position>
      <street>Valiergasse</street>
      <zip>6020</zip>
    </Array>
  </places>
  <status>
    <duration>139</duration>
    <procedure>geo.location.search.2</procedure>
    <success>true</success>
  </status>
</Response>

The output cloudmade_responses.csv should be encoded in UTF-8 and should look like this:

"Id","Latitude","Longitude"
"140","47.22148736","15.62440613"
"141","47.26638083","11.43725792"

Partial XSLT solution

I'm comfortable with basic XPath, but unsure about how to integrate XPath expressions into a more complex XSLT document.

The XPath expression to extract the Latitude is

/Response/places/Array[@pos=0]/position/lat

The XPath expression to extract the Longitude is

/Response/places/Array[@pos=0]/position/lon

Pass these to XmlStar to transform a single document into an unquoted CSV row:

$ xml sel -t -v "/Response/places/Array[@pos=0]/position/lat" -o "," -v "/Response/places/Array[@pos=0]/position/lon" 140.xml
47.22148736,15.62440613

Adding the -C option and piping the output writes an XSLT description of the transformation:

xml select -C -t -v "/Response/places/Array[@pos=0]/position/lat" -o "," -v "/Response/places/Array[@pos=0]/position/lon" 140.xml > partial_solution.xslt

The output partial_solution.xslt looks like this:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" version="1.0" extension-element-prefixes="exslt">
  <xsl:output omit-xml-declaration="yes" indent="no"/>
  <xsl:template match="/">
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="/Response/places/Array[@pos=0]/position/lat"/>
    </xsl:call-template>
    <xsl:text>,</xsl:text>
    <xsl:call-template name="value-of-template">
      <xsl:with-param name="select" select="/Response/places/Array[@pos=0]/position/lon"/>
    </xsl:call-template>
  </xsl:template>
  <xsl:template name="value-of-template">
    <xsl:param name="select"/>
    <xsl:value-of select="$select"/>
    <xsl:for-each select="exslt:node-set($select)[position()&gt;1]">
      <xsl:value-of select="'&#10;'"/>
      <xsl:value-of select="."/>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

I can now perform the same transformation using the XSLT file instead:

$ xml tr partial_solution.xslt 140.xml
47.22148736,15.62440613

However, I'm unsure how to modify the XSLT description to meet all my requirements.

I can't honestly say I fully understand the partial XSLT solution either.

Full solution using a scripting language

PowerShell is a scripting language with built-in support for XML and CSV processing. With its succinct pipeline syntax you can solve the problem in a few lines:

Get-ChildItem -Path |
Select -Property @(
  @{ Name = 'Id'; Expression = { $_.BaseName } },
  @{ Name = 'Latitude'; Expression = {(Select-Xml -Path $_.FullName -XPath '/Response/places/Array[@pos=0]/position/lat').Node.InnerText } },
  @{ Name = 'Longitude'; Expression = {(Select-Xml -Path $_.FullName -XPath '/Response/places/Array[@pos=0]/position/lon').Node.InnerText } }
) |
Export-Csv -Path '.\cloudmade_responses.csv' -NoTypeInformation -Encoding UTF8

Executing that in the same directory as the XML files produces a new file called cloudmade_response.csv. It looks like this:

"Id","Latitude","Longitude"
"140","47.22148736","15.62440613"
"141","47.26638083","11.43725792"

The output is exactly as specified.

There are surely similarly succinct solution in other scripting languages such as Python and Perl.

Solving the problem using XSLT should allow any language with an XSLT processor to reuse the solution.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top