Question

I need to search through a collection of "Item" objects that contain a time property. I have a fast solution in place but its very messy (will post if nothing better comes up). Below are my attempts at performing the search myself using LINQ and whatnot.

In my particular case, I know the items are ordered low to high based on time. When I iterate through them, it's 9/12, 9/13, 9/14. I'd like to find a solution that's fast even if this isn't ordered, but not important right now.

//ICollection c = GetCollection(); //25,000+ items
DateTime TIME = DateTime.Now.AddDays(-1);

EventLog scan = new EventLog("Application", "Server", "N/A");
EventLogCollection c = scan.Entries;
Console.WriteLine(logs.Count); // All entries already in list here

// 64 sec - SLOW
List<Item> l1 = new List<Item>();
foreach (Item i in c) { 
  if (i.time > TIME) { 
    l1.Add(i); }
}

// 93 sec - SLOWER
var l2 = c.Cast<Item>().AsParallel().Select(n => n.time > TIME);
var i = l2.Count();

// 98 sec - EVEN SLOWER!
List<Item> l3 = new List<Item>();
Parallel.ForEach(c.Cast<Item>(), n => {
    if (n.time > TIME) {
        l3.add(n);
    }
});

My current solution is do a BinarySearch for the start and end times and loop through the ICollection based on those indexes. It's very fast (1-2 sec) but very messy. I don't need a different solution but thought I'd toss this out to you performance experts.

Is there a faster, elegant to way to search the ICollection? BTW I have no control over the collection I'm given and can't change it's structure. .NET 4.0

And no, I can't use System.Diagnostics.Eventing.Reader because I'm stuck with Windows XP

Was it helpful?

Solution

EDIT

On the assertion that dasblinkenlight's answer is generally right I wrote a quick generic helper function.

public static int BinaryFirstIndexOf<T>(
    Func<int, T> indexer,
    Func<T, boo> predicate,
    int count)
{
    var low = 0;
    var high = count - 1;

    while (low < (high - 1))
    {
        var mid = low + ((high - low) / 2);

        if (predicate(indexer(mid))
        {
            high = mid;
        }
        else
        {
            low = mid;
        }
    }

    if (predicate(indexer(low)))
    {
        return low;
    }

    if (low != high && predicate(indexer(high)))
    {
        return high;
    }

    return -1;
}

Which I would use like this,

var time = DateTime.Now.AddDays(-1);
var c = scan.Entries;

var first = BinaryFirstIndexOf(
     i => c[i], 
     e => e.TimeGenerated > time,
     c.Count);

if (first >= 0)
{
    var result = new List<Item>(c.Count - first);
    Enumerable.Range(first, c.Count - first).AsParallel()
    .ForAll(i => 
        {
            var j = i - first;
            result[j] = (Item)c[i];
        });
}

Don't you want to do something like this

var time = DateTime.Now.AddDays(-1);
var c = scan.Entries;

var cutOff = 0;
for (var i = c.Count - 1; i > - 1; i--)
{
    if (c[i].TimeGenerated < time)
    {
        cutOff = i;
        break;
    }
}

var result = new List<Item>(c.Count - cutOff - 1);
var j = 0;
for (var i = cutOff + 1; i < c.Count; i ++)
{
    result[j] = (Item)c[i];
    j++;
}

I'm assuming the data you want is on the end and takes up less that half the collection.

Or perhaps using linq to do the casting in parallel,

var time = DateTime.Now.AddDays(-1);
var c = scan.Entries;

var cutOff = 0;
for (var i = c.Count - 1; i > - 1; i--)
{
    if (c[i].TimeGenerated < time)
    {
        cutOff = i;
        break;
    }
}

var result = new List<Item>(c.Count - cutOff - 1);
Enumerable.Range(cutOff + 1, c.Count - cutOff - 1).AsParallel()
    .ForAll(i => 
        {
            var j = i - cutOff - 1;
            result[j] = (Item)c[i];
        });

OTHER TIPS

Your "query" is incorrect. You're actually selecting a bunch of booleans, but not filtering. You want:

var l2 = c.Cast<EventLogEntry>().AsParallel().Where(n => n.time > TIME);

This may not be faster, but it's worth a check. Heck, I wouldn't even do it in parallel to start with.

In fact, Count has an overload which takes a predicate, so you can use this:

var count = c.Cast<EventLogEntry>().Count(n => n.time > TIME);

Or a parallelized version:

var count = c.Cast<EventLogEntry>().AsParallel().Count(n => n.time > TIME);

Like Chris, I'm shocked that this would take over a minute for any implementation...

This first statement:

List<Item> l1 = new List<Item>();
foreach (Item i in c) { 
  if (i.time > TIME) { 
    l1.Add(i); }
}

Does it improve performance changing to (will filter the list before the foreach is run):

List<Item> l1 = new List<Item>();
foreach (Item i in c.Where(a => a.time > TIME)) { 
    l1.Add(i);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top