Question

(OK, don't yell at me, it's very late here :))

I'm researching delta diff tools (commandline tools or components, it doesn't matter as long as I can call them from Delphi 2010)

I have this project where I upload files to a server, I need to optimize upload and so it would really great if I can upload the delta file only instead of sending the new file and then comparing both old & new file versions on the server.

I read about Duplicity here

Duplicity is a variation on rdiff-backup that allows for backups without cooperation from the storage server, as with simple storage services like Amazon S3. It works by generating the hashes for each block in advance, encrypting them, and storing them on the server, then retrieving them when doing an incremental backup. The rest of the data is also stored encrypted for security purposes.

This got me thinking, is there a tool (or a way) to generate a patch or delta file (I'm not sure what the proper term is) based on the new file, without having access to the original file?

I mean let's say I have this file that I modified once:

my-data.db
[ my-data.db ] modified       --> [ delta-file-1.diff ]

Is there a way to construct [ delta-file-1.diff ] based on the new file without having access to the old file? (may be by storing some kind of a signature for the original file?)

I researched a lot this topic (rdiff, PatchAPI, ZDelta, XDelta, MSDelta, etc...) but I can't find any real-world working example on this.

These references talk about this but I wanted to hear if anyone can guide me and/or suggest better tools that answer the question that I asked above.

Compressing a Target Without a Source File

Windows Patch API: Compressing a Target Without a Basis (Source) File

Thanks in advance!

Was it helpful?

Solution 3

For those interested: there's rdiff which does have a windows port and can be launched from Delphi, and librsync which is, if I understood correctly, the engine behind rdiff.

Both require signature of the old file (which is much smaller than the file itself) and the complete new file.

Reverse delta can be done to allow getting the new file from the old one.

OTHER TIPS

No you can not get the difference from A and B without a way to get A and B.

You could reconstruct A from older versions of A and applying the differences.

The signature of A won't cut it.

When you only append to a file or edit in blocks of known size (most likely not possible for text files), I guess hashing would be feasible. See eMule's AICH (eMule wiki/aMule wiki).

Essentially you split a file into blocks of size N, and calculate the hash code of each block. Then you calculate a "super hash" out of M blocks. With that approach you could track down changed blocks without having to transfer much metadata.

Otherwise: You cannot create the whole file out of a diff without knowing the base the diff was taken of. Neither can you create a diff without knowing the base.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top