Quickly retrieve the subset of properties used in a huge collection in C#
-
02-10-2019 - |
Question
I have a huge Collection (which I can cast as an enumerable using OfType<>()) of objects. Each of these objects has a Category
property, which is drawn from a list somewhere else in the application. This Collection can reach sizes of hundreds of items, but it is possible that only, say, 6/30 of the possible Categories are actually used. What is the fastest method to find these 6 Categories? The size of the huge Collection discourages me from just iterating across the entire thing and returning all unique values, so is there a faster method of accomplishing this?
Ideally I'd collect the categories into a List<string>
.
Solution
If you are using .NET 3.5 then try this:
List<string> categories = collection
.Cast<Foo>()
.Select(foo => foo.Category)
.Distinct()
.ToList();
It should be very fast.
I assume these objects originally came from a database? If so then you might want to ask the database to do the work for you. If there is an index on that column then you will get the result close to instantly without even having to fetch the objects into memory.
OTHER TIPS
The size of the huge Collection discourages me from just iterating across the entire thing and returning all unique values
I am afraid in order to find all used categories, you will have to look at each item once, so you can hardly avoid iterating (unless you keep track of the used categories while building your collection).
Try if Mark Byers solution is fast enough for you and only worry about its performance if it isn't.