Find frequency of values in an Array or XML (C#)
-
22-08-2019 - |
Question
I have an XML feed (which I don't control) and I am trying to figure out how to detect the volume of certain attribute values within the document.
I am also parsing the XML and separating attributes into Arrays (for other functionality)
Here is a sample of my XML
<items>
<item att1="ABC123" att2="uID" />
<item att1="ABC345" att2="uID" />
<item att1="ABC123" att2="uID" />
<item att1="ABC678" att2="uID" />
<item att1="ABC123" att2="uID" />
<item att1="XYZ123" att2="uID" />
<item att1="XYZ345" att2="uID" />
<item att1="XYZ678" att2="uID" />
</items>
I want to find the volume nodes based on each att1 value. Att1 value will change. Once I know the frequency of att1 values I need to pull the att2 value of that node.
I need to find the TOP 4 items and pull the values of their attributes.
All of this needs to be done in C# code behind.
If I was using Javascript I would create an associative array and have att1 be the key and the frequency be the value. But since I'm new to c# I don't know how to duplicate this in c#.
So I believe, first I need to find all unique att1 values in the XML. I can do this using:
IEnumerable<string> uItems = uItemsArray.Distinct();
// Where uItemsArray is a collection of all the att1 values in an array
Then I get stuck on how I compare each unique att1 value to the whole document to get the volume stored in a variable or array or whatever data set.
Here is the snippet I ended up using:
XDocument doc = XDocument.Load(@"temp/salesData.xml");
var topItems = from item in doc.Descendants("item")
select new
{
name = (string)item.Attribute("name"),
sku = (string)item.Attribute("sku"),
iCat = (string)item.Attribute("iCat"),
sTime = (string)item.Attribute("sTime"),
price = (string)item.Attribute("price"),
desc = (string)item.Attribute("desc")
} into node
group node by node.sku into grp
select new {
sku = grp.Key,
name = grp.ElementAt(0).name,
iCat = grp.ElementAt(0).iCat,
sTime = grp.ElementAt(0).sTime,
price = grp.ElementAt(0).price,
desc = grp.ElementAt(0).desc,
Count = grp.Count()
};
_topSellers = new SalesDataObject[4];
int topSellerIndex = 0;
foreach (var item in topItems.OrderByDescending(x => x.Count).Take(4))
{
SalesDataObject topSeller = new SalesDataObject();
topSeller.iCat = item.iCat;
topSeller.iName = item.name;
topSeller.iSku = item.sku;
topSeller.sTime = Convert.ToDateTime(item.sTime);
topSeller.iDesc = item.desc;
topSeller.iPrice = item.price;
_topSellers.SetValue(topSeller, topSellerIndex);
topSellerIndex++;
}
Thanks for all your help!
Solution
If you have the values, you should be able to use LINQ's GroupBy...
XDocument doc = XDocument.Parse(xml);
var query = from item in doc.Descendants("item")
select new
{
att1 = (string)item.Attribute("att1"),
att2 = (string)item.Attribute("att2") // if needed
} into node
group node by node.att1 into grp
select new { att1 = grp.Key, Count = grp.Count() };
foreach (var item in query.OrderByDescending(x=>x.Count).Take(4))
{
Console.WriteLine("{0} = {1}", item.att1, item.Count);
}
OTHER TIPS
Are you using .NET 3.5? (It looks like it based on your code.) If so, I suspect this is pretty easy with LINQ to XML and LINQ to Objects. However, I'm afraid it's not clear from your example what you want. Do all the values with the same att1 also have the same att2? If so, it's something like:
var results = (from element in items.Elements("item")
group element by element.Attribute("att1").Value into grouped
order by grouped.Count() descending
select grouped.First().Attribute("att2").Value).Take(4);
I haven't tested it, but I think it should work...
- We start off with all the item elements
- We group them (still as elements) by their att1 value
- We sort the groups by their size, descending so the biggest one is first
- From each group we take the first element to find its att2 value
- We take the top four of these results
You can use LINQ/XLINQ to accomplish this. Below is a sample console application I just wrote, so the code might not be optimized but it works.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
using System.Text;
namespace FrequencyThingy
{
class Program
{
static void Main(string[] args)
{
string data = @"<items>
<item att1=""ABC123"" att2=""uID"" />
<item att1=""ABC345"" att2=""uID"" />
<item att1=""ABC123"" att2=""uID"" />
<item att1=""ABC678"" att2=""uID"" />
<item att1=""ABC123"" att2=""uID"" />
<item att1=""XYZ123"" att2=""uID"" />
<item att1=""XYZ345"" att2=""uID"" />
<item att1=""XYZ678"" att2=""uID"" />
</items>";
XDocument doc = XDocument.Parse(data);
var grouping = doc.Root.Elements().GroupBy(item => item.Attribute("att1").Value);
foreach (var group in grouping)
{
var groupArray = group.ToArray();
Console.WriteLine("Group {0} has {1} element(s).", groupArray[0].Attribute("att1").Value, groupArray.Length);
}
Console.ReadKey();
}
}
}