Question

Based on this question:

[How can I do SELECT UNIQUE with LINQ?

I wrote the below expression to select rows with unique OrganizationID column from the dt datatabe which contains multiple columns.

var distinctRows = (from DataRow dRow in dt.Rows
                    select new { col1 = dRow["OrganizationID_int"] }).Distinct();

but when I check distinctRows after the expression being executed, it only has records with 1 column (col1) instead of holding the whole columns. I afraid that adding expressions like col2=... and etc, may be interpreted that I want select distinct on all these columns.

So how can I get the whole row while applying unique filter on only 1 column but not the whole columns?

Was it helpful?

Solution

I want the whole rows which satisfy that unique condition with all columns. I want to iterate in the next step.

So you don't want to group by that field and return one of the multiple rows. You want only rows which are unique.

One way is using Enumerable.GroupBy and count the rows in each group:

var uniqueRows = dt.AsEnumerable()
                   .GroupBy(r => r.Field<int>("OrganizationID_int"))
                   .Where(g => g.Count() == 1)
                   .Select(g => g.First());

OTHER TIPS

There is two versions of Distinct exception methods, one of them takes IEqualityComparar that can determine how you're going to distinguish different elements.

Here full example of how you can use this method:

class Item
{
    public int Id {get; set;}
    public string Name {get;set;}
}

class ItemComparer : IEqualityComparer<Item>
{
    public bool Equals(Item x, Item y)
    {
        return x.Id == y.Id;
    }

    public int GetHashCode(Item x)
    {
        return x.Id;
    }
}

void Main()
{
  var sequence = new List<Item>() 
  {
      new Item {Id = 1, Name = "1"}, 
      new Item {Id = 1, Name = "2"}
  };

  // Using overloaded version of Distinct method!
  var distinctSequence = sequence.Distinct(new ItemComparer());

  // distinctSequence contains inly one Item with Id = 1
  distinctSequence.Dump();
}

What you are looking for is GroupBy, followed by an aggregate function like Min, Sum, etc to select one of the row values for each column.

var distinctRows = 
    (from DataRow dRow in dt.Rows
    group dRow by dRow["OrganizationID_int"] into g
    select new { OrgId = g.Key; Col2 = g.First().Col2, Col3 = g.First().Col3 })

Use grouping with Linq to DataSet:

var distinctRows = from row in dt.AsEnumerable()
                   group row by new { 
                      col1 = row.Field<int>("OrganizationID_int")
                      // other columns here 
                   } into g
                   select g.First();

Have a look at MoreLinq's DistinctBy method, with which you can phrase your query like so:

dt.Rows.DistinctBy(dRow => dRow["OrganizationID_int"])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top