Question

(First post, please be gentle!)

I am just learning about LINQ to XML in all its glory and frailty, trying to hack it to do what I want to do:

Given an XML file like this -

<list>
<!-- random data, keys, values, etc.-->

  <key>FIRST_WANTED_KEY</key>
  <value>FIRST_WANTED_VALUE</value>

  <key>SECOND_WANTED_KEY</key>
  <value>SECOND_WANTED_VALUE</value> <!-- wanted because it's first -->

  <key>SECOND_WANTED_KEY</key>
  <value>UNWANTED_VALUE</value>  <!-- not wanted because it's second -->

  <!-- nonexistent <key>THIRD_WANTED_KEY</key> -->
  <!-- nonexistent <value>THIRD_WANTED_VALUE</value> -->

<!-- more stuff-->
</list>

I want to extract the values of a set of known "wanted keys" in a robust fashion, i.e. if SECOND_WANTED_KEY appears twice, I only want SECOND_WANTED_VALUE, not UNWANTED_VALUE. Additionally, THIRD_WANTED_KEY may or may not appear, so the query should be able to handle that as well. I can assume that FIRST_WANTED_KEY will appear before other keys, but can't assume anything about the order of the other keys - if a key appears twice, its values aren't important, I only want the first one. An anonymous data type consisting of strings is fine.

My attempt has centered around something along these lines:

var z = from y in x.Descendants()
        where y.Value == "FIRST_WANTED_KEY"
        select new
        {
          first_wanted_value = ((XElement)y.NextNode).Value,
         //...
        }

My question is what should that ... be? I've tried, for instance, (ugly, I know)

second_wanted_value = ((XElement)y.ElementsAfterSelf()
                      .Where(w => w.Value=="SECOND_WANTED_KEY")
                      .FirstOrDefault().NextNode).Value

which should hopefully allow the key to be anywhere, or non-existent, but that hasn't worked out, since .NextNode on a null XElement doesn't seem to work.

I've also tried to add in a

.Select(t => { 
    if (t==null) 
        return new XElement("SECOND_WANTED_KEY",""); 
    else return t;
})

clause in after the where, but that hasn't worked either.

I'm open to suggestions, (constructive) criticism, links, references, or suggestions of phrases to Google for, etc. I've done a fair share of Googling and checking around S.O., so any help would be appreciated.

Thanks!

EDIT: Let me add a layer of complexity to this - I should have included this in the first place. Let's say the XML document looks like this:

<lists>
    <list>
      <!-- as above -->
    </list>
    <list>
      <!-- as above -->
    </list>
</lists>

and I want to extract multiple sets of these key-value pairs. Question/Caution: if SECOND_WANTED_KEY doesn't appear in the first <list> element but appears in the second, I don't want to accidentally pick up the second list element's SECOND_WANTED_KEY.

EDIT #2:

As another idea, I've tried creating a HashSet of the keys that I'm looking for and doing this:

HashSet<string> wantedKeys = new HashSet<string>();
wantedKeys.Add("FIRST_WANTED_KEY");
//...add more keys here
var kvp = from a in x.Descendants().Where(a => wantedKeys.Contains(a.Value))
          select new KeyValuePair<string,string>(a.value,
             ((XElement)a.NextNode).Value);

This gets me all of the key-value pairs, but I'm not sure if it guarantees that I'll properly "associate" the pairs to their parent `' element. Any thoughts or comparisons between these two approaches would be helpful.

Status Update 4/9/10

As of right now I'm still mostly thinking the hash set approach is the most preferred. It seems like most of the XML processing done by .NET is done in document order - so far all of my test cases have been working out.

I'd offer a bounty and/or upvote answers, but don't have enough rep points for that. I'll decide on an answer today, so get 'em in! Thanks.

Was it helpful?

Solution

This gets the value of the first <value> element after the first <key> element containing "SECOND_WANTED_KEY":

XDocument doc;

string result = (string)doc.Root
                           .Elements("key")
                           .First(node => (string)node == "SECOND_WANTED_KEY")
                           .ElementsAfterSelf("value")
                           .First();

Add null checks as desired.

OTHER TIPS

XDocument doc = ...

var wantedKeyValuePairs =
    from keyElement in doc.Root.Elements("key")
    let valueElement = keyElement.ElementsAfterSelf("value").First()
    select new { Key = keyElement.Value, Value = valueElement.Value } into kvp
    group kvp by kvp.Key into g
    select g.First();

Explanation : this query takes each <key> element and its following <value> element, and makes a key-value pair with these elements. It then groups the key-value pairs by key, and takes only the first key-value pair for each key

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top