C#3.0：需要从 List<> 返回重复项

https://stackoverflow.com/questions/493673

20-08-2019
|

题

我有一个 C# 中的对象 List<>，我需要一种方法来返回列表中被视为重复的那些对象。我不需要不同的结果集，我需要一个将从我的存储库中删除的项目的列表。

就本示例而言，假设我有一个“汽车”类型列表，我需要知道其中哪些汽车与列表中的其他汽车颜色相同。以下是列表中的汽车及其颜色属性：

Car1.Color = Red;

Car2.Color = Blue;

Car3.Color = Green;

Car4.Color = Red;

Car5.Color = Red;

对于此示例，我需要结果（IEnumerable<>、List<> 或其他）包含 Car4 和 Car5，因为我想从我的存储库或数据库中删除它们，以便我的存储库中每种颜色只有一辆车。任何帮助，将不胜感激。

解决方案

我无意中这个昨天编码，当我试图写一个“通过投影不同的”。我包括了！当我不该这样，但这次是恰到好处：

public static IEnumerable<TSource> DuplicatesBy<TSource, TKey>
    (this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
    HashSet<TKey> seenKeys = new HashSet<TKey>();
    foreach (TSource element in source)
    {
        // Yield it if the key hasn't actually been added - i.e. it
        // was already in the set
        if (!seenKeys.Add(keySelector(element)))
        {
            yield return element;
        }
    }
}

您会然后调用它：

var duplicates = cars.DuplicatesBy(car => car.Color);

其他提示

var duplicates = from car in cars
                 group car by car.Color into grouped
                 from car in grouped.Skip(1)
                 select car;

此组中的汽车通过颜色，然后跳过来自各组的第一个结果，并返回其余各组展平为一个单一的序列。

如果您对哪一个你想保留，例如特殊要求如果汽车有一个Id财产，你想保持最低的Id车，那么你可以在里面添加一些排序，e.g。

var duplicates = from car in cars
                 group car by car.Color into grouped
                 from car in grouped.OrderBy(c => c.Id).Skip(1)
                 select car;

下面是一个稍微不同的LINQ的解决方案，我想使它更加明显，你现在要做什么：

var s = from car in cars
    group car by car.Color into g
    where g.Count() == 1
    select g.First();

这只是通过颜色分组汽车，折腾出所有具有多个元素的组，然后把剩下的到返回的IEnumerable

IEnumerable<Car> GetDuplicateColors(List<Car> cars)
{
    return cars.Where(c => cars.Any(c2 => c2.Color == c.Color && cars.IndexOf(c2) < cars.IndexOf(c) ) );
}

这基本上意味着“返回汽车那里是在用相同的颜色列表中的任何车和一个较小的指数。”

不知道性能，虽然。我怀疑具有O（1）查找用于重复（如字典/ HashSet的方法）的方法可以是用于大集更快。

创建新Dictionary<Color, Car> foundColors和List<Car> carsToDelete

然后你通过你原来像这样轿车的列表迭代

foreach(Car c in listOfCars)
{
    if (foundColors.containsKey(c.Color))
    {
        carsToDelete.Add(c);
    }
    else
    {
        foundColors.Add(c.Color, c);
    }
}

然后就可以删除每一辆车那是在foundColors。

您可以把你的“删除记录”的逻辑在if声明，而不是建立新的名单得到一个小的性能提升，但你措辞的问题的方式提出，你需要收集它们在列表中。

如果没有实际编码，那么像这样的算法怎么样：

迭代你的 List<T> 创建一个 Dictionary<T, int>
迭代你的 Dictionary<T, int> 删除其中的条目 int >1

任何留在 Dictionary 有重复项。当然，实际删除的第二部分是可选的。您只需迭代即可 Dictionary 并寻找 >1 采取行动。

编辑：好吧，我把 Ryan 的密码加了，因为他确实给了你代码。;)

我的答案从跟随受访者需要灵感（以该顺序）：乔Coehoorn，格雷格比奇和乔恩斯基特

我决定提供一个完整的例子，与你的汽车颜色的静态列表的假设是（真正的单词的效率）。我相信下面的代码说明了在一个优雅的问题的一个完整的解决方案，虽然不一定是超高效的方式。

#region SearchForNonDistinctMembersInAGenericListSample
public static string[] carColors = new[]{"Red", "Blue", "Green"}; 
public static string[] carStyles = new[]{"Compact", "Sedan", "SUV", "Mini-Van", "Jeep"}; 
public class Car
{
    public Car(){}
    public string Color { get; set; }
    public string Style { get; set; }
}
public static List<Car> SearchForNonDistinctMembersInAList()
{
    // pass in cars normally, but declare here for brevity
    var cars = new List<Car>(5) { new Car(){Color=carColors[0], Style=carStyles[0]}, 
                                      new Car(){Color=carColors[1],Style=carStyles[1]},
                                      new Car(){Color=carColors[0],Style=carStyles[2]}, 
                                      new Car(){Color=carColors[2],Style=carStyles[3]}, 
                                      new Car(){Color=carColors[0],Style=carStyles[4]}};
    List<Car> carDupes = new List<Car>();

    for (int i = 0; i < carColors.Length; i++)
    {
        Func<Car,bool> dupeMatcher = c => c.Color == carColors[i];

        int count = cars.Count<Car>(dupeMatcher);

        if (count > 1) // we have duplicates
        {
            foreach (Car dupe in cars.Where<Car>(dupeMatcher).Skip<Car>(1))
            {
                carDupes.Add(dupe);
            }
        }
    }
    return carDupes;
}
#endregion

我要通过这里以后回来，并比较这解决了所有三个启示的，只是为了对比的样式。这是相当有趣的。

公共静态的IQueryable重复（这IEnumerable的源），其中TSource：IComparable的 {

if (source == null)   
     throw new ArgumentNullException("source");   
 return source.Where(x => source.Count(y=>y.Equals(x)) > 1).AsQueryable<TSource>();

}

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow