Question

I'm trying to optimize an archive format which stores data in nodes. As time goes on the container becomes messy (small unusable "free" space nodes accumulate etc). What I'm doing is similar to defragmenting. I already have a list of all the data locations, and a representation of where I would like the data to be in its final state, but I'm struggling on the task of moving the actual data from its current configuration into the optimal configuration. The elements are not the same size nor multiples of any smallest block (unless you count bytes). Is there some obvious method I am overlooking? I'm not even sure what this problem is called to search for algorithms, the closest I've gotten is in-place sorting.

So far I have tried swapping blocks of data but I need to keep track of node fragments, and it's become way too messy to be feasible.

I don't want to resort to writing a temporary copy and then replacing, since the files are very large.

Was it helpful?

Solution

Regarding performance, copying the data to a new file will most likely be the best choice.

If available disk space is an issue, you have a fun time ahead of you, because this will require some exquisite hacking skills to get fast. I think, the best can do is to allocate a large slap of buffer memory and maintain a list of holes in the file whose data resides within this buffer. Then you start by filling this buffer with everything that's out of place, starting at the beginning of the file. Once the buffer is full, you can copy data from anywhere into the hole and keep pushing data into the buffer at the end of the hole you are filling. Whenever you run out of buffer space, you will need to jump the biggest hole available and move the data that belongs there. As I said, this won't be easy, but it might be fun...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top