Question

I am currently trying to improve a C# project I am working on. Specifically, my goal is to parallelize some operations to reduce processing time. I am starting with small snippets just to get the hang of it. The following code (not parallel) works correctly (as expected)

for (int i = 0; i < M; i++)
{
     double d;
     try
     {
          d = Double.Parse(lData[i]);
     }
     catch (Exception)
     {
         throw new Exception("Wrong formatting on data number " + (i + 1) + " on line " + (lCount + 1));
     }
     sg[lCount % N][i] = d;
}

By using the following (parallel) code I would expect to obtain the exact same results, but that is not the case.

Parallel.For(0, M, i =>
{
    double d;
    try
    {
        d = Double.Parse(lData[i]);
    }
    catch (Exception)
    {
        throw new Exception("Wrong formatting on data number " + (i + 1) + " on line " + (lCount + 1));
    }
    sg[lCount % N][i] = d;
});

The part of the program these snippets are from reads data from a file, one line at a time. Each line is a sequence of comma-separated double precision numbers, that I put in the vector lData[] using String.Split(). Every M lines, the data sequence starts over with a new data frame (hence the % M in the element index when i assign the values).

It is my understanding (clearly wrong) that by putting the code from the (serial) for-loop in the third parameter of Parallel.For I parallelize its execution. This shouldn't change the results. Is the problem in the fact that the threads are all accessing to lCount and M? Should I make thread-local copies?

Thanks.

(since I'm new I am not allowed to create the Parallel.For tag)

EDIT: I ran some more tests. Basically I looked at an output earlier in the code than what I did before. It would appear that the parallel version of my code does not fill the sg[][] array entirely. Rather, some values are left to their defaults (0, in my case).

EDIT 2 (to answer some of the comments): lData[] is a string[] obtained by using string.Split(). The original string I am splitting is read from my data files. I wrote the code that generates them, so they are generally well-formatted (I still used the try-catch construct out of habit). Just before the for-loop (wither parallel or serial) I check to verify that lData[] has the correct number of values (M). If it doesn't, I throw an exception that prevents the program from reaching the for-loop in question. sg[][] is a N by M array of type double (there was a typo in the snippets, now corrected; In my original code this error was not present). After I read N lines from the file the array sg[][] contains a whole data set. After the for-loop (either parallel or serial) there is a portion of come that looks like this: lCount++; //counting the lines I have already read if((lCount % N) == 0) { //do things with sg[][] //reset sg[][] } So, I am on purpose overwriting all lines of sg[][]. The for-loop's whole purpose is to update the values in sg[][].

Était-ce utile?

La solution

After doing some line-by-line debugging over the weekend, I managed to find where the problem was.

Basically, unbeknownst to me, the threads created by the parallel.for did not inherit the CultureInfo (this is the normal behaviour of threads, and I didn't know that). What was happening then was that strings like 3.256 were being parsed to 3256.0. This caused the issues I found with the output. (Note: the default locale on my computer is set to use a comma as decimal separator, but I had set to the full stop in program.cs for all my code. I had incorrectly assumed this would be inherited by new threads)

The correct parallel snippet looks like this:

CultureInfo newCulture = (CultureInfo)CultureInfo.CurrentCulture.Clone();
newCulture.NumberFormat.NumberDecimalSeparator = ".";
Parallel.For(0, M, i =>
{
    Thread.CurrentThread.CurrentCulture = newCulture;
    double d;
    try
    {
        d = Double.Parse(lData[i]);
    }
    catch (Exception)
    {
        throw new Exception("Wrong formatting on data number " + (i + 1) + " on line " + (lCount + 1));
    }
    GlobalVar.sgData[lCount % N][i] = d;
});

Thanks to all who pitched in with comments and opinions. Good information to improve my programming.

I updated the question tags to reflect where the issue really was.

Autres conseils

Nothing in the code is inherently wrong, as far as I can see. My guess would be that you have a race condition or closure issue in the function containing the snippets, probably on the variable N.

If you have nested this snippet inside of another Parallel.For() call, you may be missing the fact that N is closed over in the lambda expression, and may be getting updated. Therefore while you are off updating 'N', you expect it to stay constant inside of the lambda. To resolve that, try this:

// Create a local copy of N and M, so that if we update 
// it elsewhere it doesn't affect the closure
var n = N;
var m = M;
Parallel.For(0, m, i =>
{
    double d;
    try
    {
        d = Double.Parse(lData[i]);
    }
    catch (Exception)
    {
        throw new Exception("Wrong formatting on data number " + (i + 1) + " on line " + (lCount + 1));
    }
    sg[lCount % n][i] = d;
});
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top