Question

I deal with very large binary files ( several GB to multiple TB per file ). These files exist in a legacy format and upgrading requires writing a header to the FRONT of the file. I can create a new file and rewrite the data but sometimes this can take a long time. I'm wondering if there is any faster way to accomplish this upgrade. The platform is limited to Linux and I'm willing to use low-level functions (ASM, C, C++) / file system tricks to make this happen. The primimary library is Java and JNI is completely acceptable.

Was it helpful?

Solution

There's no general way to do this natively.

Maybe some file-systems provide some functions to do this (cannot give any hint about this), but your code will then be file-system dependent.


A solution could be that of simulating a file-system: you could store your data on a set of several files, and then provide some functions to open, read and write data as if it was a single file.

OTHER TIPS

Sounds crazy, but you can store the file data in reverse order, if it is possible to change function that reads data from file. In that case you can append data (in reverse order) at the end of the file. It is just a general idea, so I can't recommend anything particular. The code for reversing of current file can looks like this:

 std::string records;
 ofstream out;
std::copy( records.rbegin(), records.rend(), std::ostream_iterator<string>(out));

It depends on what you mean by "filesystem tricks". If you're willing to get down-and-dirty with the filesystem's on-disk format, and the size of the header you want to add is a multiple of the filesystem block size, then you could write a program to directly manipulate the filesystem's on-disk structures (with the filesystem unmounted).

This enterprise is about as hairy as it sounds though - it'd likely only be worth it if you had hundreds of these giant files to process.

I would just use the standard Linux tools to do it.
Writting another application to do it seems like it would be sub-optimal.

cat headerFile oldFile > tmpFile && mv tmpFile oldFile

I know this is an old question, but I hope this helps someone in the future. Similar to simulating a filesystem, you could simply use a named pipe:

mkfifo /path/to/file_to_be_read
{ echo "HEADER"; cat /path/to/source_file; } > /path/to/file_to_be_read

Then, you run your legacy program against /path/to/file_to_be_read, and the input would be:

HEADER
contents of /path/to/source_file
...

This will work as long as the program reads the file sequentially and doesn't do mmap() or rewind() past the buffer.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top