سؤال

I have a set of data all doubles: 100 rows 20 columns

I’m pulling the data into a IEnumerable list with:

var RowsOfData = File.ReadLines(dll.Globals.OutputDir +     dll.Globals.filename).Select(a => a.Split(',').ToList());

var FilteredRowsToday = (from n in RowsOfData
       where n[1] == 1
       orderby n[0] descending
       select n);

I then have a set of functions, which do simple check on each the data rows and each returns a Bool. What I want is a count of the number of rows for which each of the functions evaluated true. And then when I scale my project up I want this processed asap in parallel if possible, I’ve tried:

foreach (var row in FilteredRowsToday) {  
is f1() true, is f2() true 
etc
}

Seems slow I’ve tried to do in parallel

foreach (var row in FilteredRowsToday.AsParallel())

no faster

I’m now thinking something like:

var TotalTrue = FilteredRowsToday.Select(item => f1() & f2() & f3()).Count();

I can pre-process the data to provide the results of the evaluations of each function as a sort of binary grid if that’s a better stating point?

F1, f2, f3 etc
1, 0, 0 row 1
1, 1, 1 row 2 etc

suggestions welcome!

هل كانت مفيدة؟

المحلول

If you're just interested in the count where all three functions evaluate to true, then this should be sufficient:

var TotalTrue = FilteredRowsToday.Count(item => f1() & f2() & f3());

As for why it's slow, your functions could be the reason behind this.

You could try only evaluating the rows until either all three functions return true, or at least one of them return false, e.g.

var TotalTrue = FilteredRowsToday.Count(item => f1() && f2() && f3());

I.e. If f1() evaluates to false, then don't bother doing the rest of the validations.

UPDATE: If your functions aren't doing any resource-intensive checks, then parallel LINQ won't do you much good (more info here).

نصائح أخرى

As I see you are reading file at once and it's a kind of comma separated file. If you would yield the records from file as you read, it will allow you to process them until you are waiting for the next read.

private IEnumerable<string> GetRecords(string fileName) {
    using (StreamReader reader = File.OpenText(fileName))
            {
                string line = reader.ReadLine();
                while (line != null)
                {   
                   yield return line.Split(',');
                   line = reader.ReadLine();
                 }
             }
}

You are also spending some time on converting the result of Split into List which is already an array and has index access needed to perform the query.

I would also advice to apply optimizations suggested before like using .Count(item => f1() & f2() & f3()); instead of .Select(item => f1() & f2() & f3()).Count();.

Nevertheless I do not believe any of this optimizations will bring any improvement with such a small amount of data. I think we can help you better if you post some details about you processing portion of code.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top