The main problem here is data compression.
Kamikaze offers you good compression algorithms for data arrays. It uses Simple16 and PForDelta coding. Simple16 is a good and (as the name says) simple list compression option. Or you can use Run Lenght Encoding. Or you can experiment with any compression algorithm you have available in Java...
Anyway, any method you use will be optimized if you first preprocess the data.
You can reduce the data calculating differences or, as @RichardTingle pointed, creating pairs of different data locations.
You can calculate C
as B
- A
. A
will have to be an int
array, since the difference between two byte
values can be higher than 255
. You can then restore B
as A
+ C
.
The advantage of combining at least two methods here is that you get much better results.
E.g. if you use the difference method with A = { 1, 2, 3, 4, 5, 6, 7 }
and B = { 1, 2, 3, 5, 6, 7, 7 }
. The difference array C
will be { 0, 0, 0, 1, 1, 1, 0 }
. RLE can compress C
in a very effective way, since it is good for compressing data when you have many repeated numbers in sequence.
Using the difference method with Simple16 will be good if your data changes in almost every position, but the difference between values is small. It can compress an array of 28 single-bit values (0
or 1
) or an array of 14 two-bit values to a single 32-byte integer.
Experiment, it all will depend on how your data behaves. And compare the data compression ratios for each experiment.
EDIT: You will have to preprocess the data before JSON and zip compressing.
Create two sets old
and now
. The latter contains all files that exists now. For the former, the old files, you have at least two options:
Should contain all files that existed before you sent them to the other PC. You will need to keep a set of what the other PC knows to calculate what has changed since the last synchronization, and send only the new data.
Contains all files since you last checked for changes. You can keep a local history of changes and give each version an "id". Then, when you sync, you send the "version id" together with the changed data to the other PC. Next time, the other PC first sends its "version id" (or you keed the "version id" of each PC locally), then you can send the other PC all the new changes (all the versions that come after the one that PC had).
The changes can be represented by two other sets: newFiles
, and deleted
files. (What about files that changed in content? Don't you need to sync these too?) The newFiles
contains the ones that only exist in set now
(and do not exist in old
). The deleted
set contains the files that only exist in set old
(and do not exist in now
).
If you represent each file as an String
with the full pathname, you safely will have unique representations of each file. Or you can use java.io.File
.
After you reduced your changes to newFiles
and deleted
files set, you can convert them to JSON, zip and do anything else to serialize and compress the data.