質問

Here is the problem I am having. I have a CSV file I am reading with 4381 small decimal numbers mostly between -5 and 5. Examples of them are 0.00000822, -0.20929117, -2.204, 4.88490078.

In my program, written with Java, I am reading the CSV file and adding all 4381 numbers together to get a sum. However, the sum that I am getting is nowhere even close to the correct number. If I add the column of numbers together in Excel using =SUM(C:C), I get 10.77918727. If I add them together in my program, I get -933.39114459. As you can see, the numbers are not even remotely close. I do not know that the value from Excel is correct, but I know there is no way -933 is correct for the data.

The thing is, the numbers do add up correctly if I use a smaller set of numbers from another CSV file, so I am sure my program is doing the addition correctly. Regardless, I have placed the code I am using to add the numbers together below.

Because the numbers add up correctly with a smaller sample, the only thing I can think of that would cause such an incorrect sum to be calculated is I am running out of bits in my sum variable (called gainLoss). That leads me to my question: is there a way to represent a decimal with more bits in Java than using BigDecimal? I'm already using BigDecimal and it seems to not provide enough bits unless I am doing something wrong.

An idea I had to get around the limitations of BigDecimal would be to manually use scientific notation like BigDecimal does, but I'd use a BigDecimal as the base and a BigDecimal as the scale. The only issue with that is it seems like a lot of work that I'd like to avoid if there's a better way to represent bigger numbers.

Let me know if there is a better way to represent numbers that need a huge number of bits or if I am doing something wrong. I'll take any suggestions as well.

Here is the code I am using to add the numbers together (minus all the checks I have to do to first make sure a number should be added):

// gainLoss is the sum of all the numbers
BigDecimal gainLoss = new BigDecimal(0.0);

// data is a 2D Object array of all the data from the CSV file.
// The 2nd column in data is filled with BigDecimal objects
for (int i = 1; i < data.length; i++) // for each data row in data (excluding the header row)
{
    // set rowAmt to the value of the number in the current row of the CSV file
    BigDecimal rowAmt = (BigDecimal)data[i][2];

    // add this row's value to gainLoss
    gainLoss = gainLoss.add(rowAmt);
}

Edit: Here is the full code I am using to calculate the sum, as requested:

public static BigDecimal calcGainLoss(Object[][] data, Calendar startCal, Calendar endCal)
{
    BigDecimal gainLoss = new BigDecimal("0.0");
    for (int i = 1; i < data.length; i++) // for each data row in data
    {
        int lineNum = i + 1;
        System.out.println("Line " + lineNum + ", i " + i + ":   " + gainLoss.toString());
        // Load the important values into memory
        String rowType = (String)data[i][4];
        BigDecimal rowAmt = (BigDecimal)data[i][2];
        Calendar rowDate = (Calendar)data[i][1];

        // Check if this row should be included in the calculations based on its row type
        if (rowType.equalsIgnoreCase("cancel")) // if this was a cancelled transaction
        {
            continue; // move on to the next row
        }

        // Check if this row should be included in the calculations based on its date
        boolean rowIsIncludedDate = false; // whether this row is within the given date range
        if ((startCal == null) && (endCal == null)) // if no start or end date was given
        {
            rowIsIncludedDate = true;
        }
        else if ((startCal == null) && (!rowDate.after(endCal))) // if no start date was given and the current row's date is before or equal to the end date
        {
            rowIsIncludedDate = true;
        }
        else if ((endCal == null) && (!rowDate.before(startCal))) // if no end date was given and the current row's date is equal to or after the start date
        {
            rowIsIncludedDate = true;
        }
        else if ((!rowDate.before(startCal)) && (!rowDate.after(endCal))) // if both dates were given and the current row's date is equal to or after the start date and equal to or before the end date
        {
            rowIsIncludedDate = true;
        }

        if (!rowIsIncludedDate) // if this row should not be included in the calculation because its date is outside of the requested range
        {
            continue; // go on to the next row
        }

        // Add the current row's value to the current sum
        gainLoss = gainLoss.add(rowAmt);
        System.out.println("Adding " + rowAmt.toString());
    }

    return gainLoss;
}
役に立ちましたか?

解決

This is a long shot, but sometimes if numbers are of very different magnitude you can run into a problem like this. The solution is to sort the numbers first, then take the difference between large pairs first. Example of how you would run into trouble:

10E38 + 5 - 10E38

will return 0 instead of 5. But

10E38 - 10E38 + 5

will return 5 (when evaluated from left to right). You don't "lose the 5" by adding it to the 10E38 (there would not be enough bits to store that) when you do things in the correct order.

It is quite unusual that "real world data" gives rise to rounding errors as big as you are experiencing, though. It is possible that you are interpreting the Excel data the wrong way. Since you have now clarified that you are not adding certain numbers when another column contains the word "cancel", you might want to confirm that you are, in fact, getting a different answer than Excel by computing the following formula:

=SUM(C1:C4381) - SUMIF(E1:E4381, "cancel", C1:C4381)
  • basically, this says "sum all elements, then subtract the ones with 'cancel' next to them".

It may turn out that your code is correct, and your assumption about the answer (or the data) is wrong...

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top