Removing duplicates from datatable [closed]

https://stackoverflow.com/questions/23251573

08-07-2023
|

Domanda

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.

Closed 9 years ago.

Improve this question

I'm trying to remove duplicates in a datatable similar to this question. However, when I do this I need to do it on an ordered dataset, as one of the criteria is that time is one of my columns, and I need only the earliest time instance to remain.

I also came across this question on ordered lists from a datatable, but I'm not sure how to combine the two.

Basically, I'm reading a file into a dataset, then I want to sort on time and three other columns, and delete all duplicates leaving the earliest time instance. The columns in question are Name (int), phone number (long), time (int) and location (string). If the name, phone and location are duplicated, remove everything after the first (Earliest) time.

dsHoldingSet.Tables["FileData"].Columns.Add("location", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("name", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("time", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("phone", typeof(long));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(int));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(long));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(Boolean));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));
dsHoldingSet.Tables["FileData"].Columns.Add("field", typeof(string));

That's the table definition, then we add rows as we validate lines in the file.

Soluzione

What we want to do is group the rows by the distinct values. If we want to use LINQ against a DataTable, the easiest way is using the built-in DataTable.AsEnumerable() extension method. This returns an IEnumerable<DataRow> for you.

Once we've got that, we need to construct a comparable object out of the composite of the three values. Here I used the approach of string concatenation, because strings are easy to compare. There are other ways you could do this, but this one is simple:

name|phone|location

This produces a sequence of IGrouping<string, DataRow>. Each grouping is also an IEnumerable<DataRow> which represents the subset for that group. So if we sort each grouping object by time, and pull the first one off, that's the first row.

Here's the complete code.

var rows = dsHoldingSet.Tables["FileData"].AsEnumerable()
    .GroupBy(row => string.Format("{0}|{1}|{2}",
        row.Field<string>("name"),
        row.Field<string>("phone"),
        row.Field<string>("location"))
    .Select(group => 
        group.OrderBy(row => row.Field<TimeSpan>("time")).First());

Some other notes - phone should be a string, not a long; unless time represents some other kind of measure you haven't gone into, it should either be a TimeSpan or a DateTime. The first thing you want to do when loading a data set to manipulate is coerce data into their most robust and correct data types - it makes the actual manipulation MUCH easier. You can deconvert back if you need to after it's done.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow