Concatenate large files using Win NT kernel API

https://stackoverflow.com/questions/20903888

23-09-2022
|

Question

I've been looking around for a way to concatenate large files (a few gigabytes) together without having to rewrite one of the files. I am sure the OS does this internally when manipulating the master file table. This is purely for an internal application where speed is critical even at the cost of data integrity (in case of risking undocumented APIs). The app processes a large amount of high-bandwidth, multi-channel ethernet data where a corrupt unit of work (file in this case) will not have a large impact on overall processing results.

At the moment when combining files A and B, the effort involved is equal to: A[Read] + B[Read] +C[Write]`. Would any of you NT gurus shed some light on how to work around this to get to the MFT directly?

I have not been able to gain any clues as to which API to explore and would appreciate some pointers. Although the app in managed, I would gladly explore native APIs and even setup light-weight VMs for testing.

Thanks in advance.

Solution

If you are appending File B to File A, all you have to do is open File A for write+append , seek to end of file, then read from B and write to A.

If you want to create File C as the concatenation of File A and File B, then you are going to have to create File C and copy A to C, then B to C.

There aren't any shortcuts.

OTHER TIPS

That's not really something a file system would do. File systems allocate space for files in terms of clusters and blocks of data, not in terms of bytes. Concatenating two files like this would only work if they were both multiples of the cluster size, and the FS might have other assumptions about how blocks are allocated to files under the covers. You might be able to do this yourself to the file system if you dismounted it and wrote a tool to directly manipulate all the file system structures. But you're risking corrupting the whole disk if you do that, not just a single file.

I don't know your exact situation but would it be possible to not append the files together at all? Just keep throwing files into some directory as you receive data, and keep an index

Then as the data is needed use the index to piece together the data to create one new file? So you only ever do the expensive file merging on demand?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow