Trouble getting data out of a xml file

https://stackoverflow.com/questions/3515788

29-09-2019
|

Question

I am trying to parse out some information from Google's geocoding API but I am having a little trouble with efficiently getting the data out of the xml. See link for example

All I really care about is getting the short_name from address_component where the type is administrative_area_level_1 and the long_name from administrative_area_level_2 However with my test program my XPath query returns no results for both queries.

public static void Main(string[] args)
{
    using(WebClient webclient = new WebClient())
    {
        webclient.Proxy = null;
        string locationXml = webclient.DownloadString("http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
        using(var reader = new StringReader(locationXml))
        {
            var doc = new XPathDocument(reader);
            var nav = doc.CreateNavigator();
            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_1]/short_name").InnerXml);
            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/address_component[type=administrative_area_level_2]/long_name").InnerXml);

        }
    }
}

Can anyone help me find what I am doing wrong, or recommending a better way?

Solution

You need to put the value of the node you're looking for in quotes:

".../address_component[type='administrative_area_level_1']/short_name"
                            ↑                           ↑

OTHER TIPS

I'd definitely recommend using LINQ to XML instead of XPathNavigator. It makes XML querying a breeze, in my experience. In this case I'm not sure exactly what's wrong... but I'll come up with a LINQ to XML snippet instead.

using System;
using System.Linq;
using System.Net;
using System.Xml.Linq;

class Test
{
    public static void Main(string[] args)
    {
        using(WebClient webclient = new WebClient())
        {
            webclient.Proxy = null;
            string locationXml = webclient.DownloadString
                ("http://maps.google.com/maps/api/geocode/xml?address=1600"
                 + "+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false");
            XElement root = XElement.Parse(locationXml);

            XElement result = root.Element("result");
            Console.WriteLine(result.Elements("address_component")
                                    .Where(x => (string) x.Element("type") ==
                                           "administrative_area_level_1")
                                    .Select(x => x.Element("short_name").Value)
                                    .First());
            Console.WriteLine(result.Elements("address_component")
                                    .Where(x => (string) x.Element("type") ==
                                           "administrative_area_level_2")
                                    .Select(x => x.Element("long_name").Value)
                                    .First());
        }
    }
}

Now this is more code¹... but I personally find it easier to get right than XPath, because the compiler is helping me more.

EDIT: I feel it's worth going into a little more detail about why I generally prefer code like this over using XPath, even though it's clearly longer.

When you use XPath within a C# program, you have two different languages - but only one is in control (C#). XPath is relegated to the realm of strings: Visual Studio doesn't give an XPath expression any special handling; it doesn't understand that it's meant to be an XPath expression, so it can't help you. It's not that Visual Studio doesn't know about XPath; as Dimitre points out, it's perfectly capable of spotting errors if you're editing an XSLT file, just not a C# file.

This is the case whenever you have one language embedded within another and the tool is unaware of it. Common examples are:

SQL
Regular expressions
HTML
XPath

When code is presented as data within another language, the secondary language loses a lot of its tooling benefits.

While you can context switch all over the place, pulling out the XPath (or SQL, or regular expressions etc) into their own tooling (possibly within the same actual program, but in a separate file or window) I find this makes for harder-to-read code in the long run. If code were only ever written and never read afterwards, that might be okay - but you do need to be able to read code afterwards, and I personally believe the readability suffers when this happens.

The LINQ to XML version above only ever uses strings for pure data - the names of elements etc - and uses code (method calls) to represent actions such as "find elements with a given name" or "apply this filter". That's more idiomatic C# code, in my view.

Obviously others don't share this viewpoint, but I thought it worth expanding on to show where I'm coming from.

Note that this isn't a hard and fast rule of course... in some cases XPath, regular expressions etc are the best solution. In this case, I'd prefer the LINQ to XML, that's all.

¹ Of course I could have kept each Console.WriteLine call on a single line, but I don't like posting code with horizontal scrollbars on SO. Note that writing the correct XPath version with the same indentation as the above and avoiding scrolling is still pretty nasty:

            Console.WriteLine(nav.SelectSingleNode("/GeocodeResponse/result/" +
                "address_component[type='administrative_area_level_1']" +
                "/short_name").InnerXml);

In general, long lines work a lot better in Visual Studio than they do on Stack Overflow...

I would recommend just typing the XPath expression as part of an XSLT file in Visual Studio. You'll get error messages "as you type" -- this is an excellent XML/XSLT/XPath editor.

For example, I am typing:

<xsl:apply-templates select="@* | node() x"/>

and immediately get in the Error List window the following error:

Error   9   Expected end of the expression, found 'x'.  @* | node()  -->x<--

XSLTFile1.xslt  9   14  Miscellaneous Files

Only when the XPath expression does not raise any errors (I might also test that it selects the intended nodes, too), would I put this expression into my C# code.

This ensures that I will have no XPath -- syntax and semantic -- errors when I run the C# program.

dtb's response is accurate. I wanted to add that you can use xpath testing tools like the link below to help find the correct xpath:

http://www.bit-101.com/xpath/

string url = @"http://maps.google.com/maps/api/geocode/xml?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=false";
string value = "administrative_area_level_1";

using(WebClient client = new WebClient())
{
    string wcResult = client.DownloadString(url);

    XDocument xDoc = XDocument.Parse(wcResult);

    var result = xDoc.Descendants("address_component")
                    .Where(p=>p.Descendants("type")
                                .Any(q=>q.Value.Contains(value))
                    );

}

The result is an enumeration of "address_component"s that have at least one "type" node that has contains the value you're searching for. The result of the query above is an XElement that contains the following data.

<address_component>
  <long_name>California</long_name>
  <short_name>CA</short_name>
  <type>administrative_area_level_1</type>
  <type>political</type>
</address_component>

I would really recommend spending a little time learning LINQ in general because its very useful for manipulating and querying in-memory objects, querying databases and tends to be easier than using XPath when working with XML. My favorite site to reference is http://www.hookedonlinq.com/

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow