Frage

I have a file which consists of 69-byte messages. No EOL characters- just message after message. The total number of bytes in the file is exactly 11,465,930,307, which is (11,465,930,307/69) = 166,172,903 messages.

My program memory-maps the file in to a byte array, looks at each 69-byte message and extracts the timestamp. I keep track of which message number I am on and then the timestamp and the message number go in a RowDetails object, which goes in a std::vector<RowDetails> called to_sort, so that I can effectively sort the whole file by timestamp.

std::cout << "Sorting....." << to_sort.size() << " rows..." << std::endl;
std::sort(std::begin(to_sort), std::end(to_sort));

However, then I create a new file which is sorted:

unsigned long long total_bytes=0;
unsigned long long total_rows=0;

ofstream a_file("D:\\sorted_all");

std::cout << "Outputting " << to_sort.size() << " rows..." << std::endl;
std::cout << "Outputting " << (to_sort.size()*69) << " bytes..." << std::endl;

for(RowDetails rd : to_sort){
    for(unsigned long long i = rd.msg_number*69; i<(rd.msg_number*69)+69; i++){
        a_file << current_bytes[i];
        total_bytes++;
    }
    total_rows++;
}

std::cout << "Vector rows: "<< total_rows <<std::endl;
std::cout << "Bytes: " << total_bytes <<std::endl;

My output:

No. of total bytes (before memory-mapping file): 11,465,930,307         CORRECT
Sorting....... 166,172,903 rows         CORRECT
Outputting 166,172,903 rows....         CORRECT
Outputting 11,465,930,307 bytes         CORRECT
Vector rows: 166,172,903                CORRECT
Bytes: 11,465,930,169                   ERROR, THIS SHOULD BE 307, not 169

How can I process the correct number of rows, but my counter, counting total bytes is wrong??

When looking at the output file in Windows 7 explorer it says size: 11,503,248,366 bytes, even though the original input file (which I memory-mapped) said the correct 11,465,930,307.

War es hilfreich?

Lösung

This is just a guess based on the snippet of code you have provided, but I'm willing to bet that rd.msg_number is a 32-bit type. It seems likely that rd.msg_number*69 would then sometimes overflow its 32-bit result, causing incorrect calculations in the inner loop bounds. I would do something like the following:

for(RowDetails rd : to_sort){
    long long msg_offset = (long long)rd.msg_number * 69;
    for(unsigned long long i = 0; i < 69; i++){
        a_file << current_bytes[msg_offset+i];
        total_bytes++;
    }
    total_rows++;
}

For the incorrect output file size, the reason is your a_file output file is opened in the default text mode, instead of binary mode. In text mode, stdio will do EOL conversion which you aren't going to want. So change the file open statement to:

ofstream a_file("d:\\sorted_all", ios::out | ios::binary);
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top