Question

EDIT: Thanks all of you. Python solution worked lightning-fast :)

I have a file that looks like this:

132,658,165,3216,8,798,651

but it's MUCH larger (~ 600 kB). There are no newlines, except one at the end of file.

And now, I have to sum all values that are there. I expect the final result to be quite big, but if I'd sum it in C++, I possess a bignum library, so it shouldn't be a problem.

How should I do that, and in what language / program? C++, Python, Bash?

Was it helpful?

Solution

Python

sum(map(int,open('file.dat').readline().split(',')))

OTHER TIPS

Penguin Sed, "Awk"

sed -e 's/,/\n/g' tmp.txt | awk 'BEGIN {total=0} {total += $1} END {print total}'

Assumptions

  • Your file is tmp.txt (you can edit this obviously)
  • Awk can handle numbers that large

The language doesn't matter, so long as you have a bignum library. A rough pseudo-code solution would be:

str = ""
sum = 0
while input
    get character from input
    if character is not ','
        append character to back of str
    else
        convert str to number
        add number to sum
        str = ""
output sum

If all of the numbers are smaller than (2**64)/600000 (which still has 14 digits), an 8 byte datatype like "long long" in C will be enough. The program is pretty straight-forward, use the language of your choice.

Since it's expensive to treat that large input as a whole I suggest you take a look at this post. It explains how to write a generator for string splitting. It's in C# but it well suited for crunching through that kind of input.

If you are worried about the total sum to not fit in a integer (say 32-bit) you can just as easily implement a bignum your self, especially if you just use integer and addition. Just carry the bit-31 to next dword and keep adding.

If precision isn't important, just accumulate the result in a double. That should give you plenty of range.

http://www.koders.com/csharp/fid881E3E70CC37E480545A0C37C98BC8C208B06723.aspx?s=datatable#L12

A fast C# CSV parser. I've seen it crunch though a few thousand 1MB files rather quickly, I have it running as part of a service that consumes about 6000 files a month.

No need to reinvent a fast wheel.

python can handle the big integers.

tr "," "\n" < file | any old script for summing

Ruby is convenient, since it automatically handles big numbers. I can't remember of Awk does arbitrary precision arithmentic, but if so, you could use

awk 'BEGIN {RS="," ; sum = 0 }
     {sum += $1 }
     END { print sum }' < file
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top