A version control system with minimum space requirements on the client side, and is good with binaries [closed]

StackOverflow https://stackoverflow.com/questions/16771253

Question

(This is my first post so be gentle) I am using subversion as version control on large binary. I have about 2.5 gigs of binaries that I update hourly. I get about 400 megs worth of differents each day. Some of the files are PEs but it is mainly compressed files that are difficult to get good diffs on. The ".svn" folders on my clients are growing daily and I do not have space on the clients to take this increass.

This size is caused by subversions pristine copy on the client (the repository is quite small). Distrubuted Version control like GIT or Mercurial will store a repository of sorts on the client which I don't have space for. I will never really do diffs, just updates to the head or to a given version. So the speed advantage of the pristine copy on the client side makes no difference to me.

So I am planning on using CVS because it is;

  1. mature

  2. light on the client side (no pristine copy, very important to me)

  3. It is a Server based architecture

  4. Open source, I am poor.

Is there something completely different I should be using, a backup solution etc.? Is there another Version control other then CVS that meets these requirements?

Was it helpful?

Solution

Mercurial with the large-files is similar to git-annex w/ the assistant. http://kiln.stackexchange.com/questions/4846/how-do-i-use-the-mercurial-largefiles-extension It's appropriately labeled a "feature of last resort" because it breaks the D in DVCS (as does git-annex), but that's what you're asking for. It works fine and is supported by Fog Creek (the folks bringing you this site to a first order approximation).

I spent 10 years in the CVS goulag. I respect you for considering it, but you don't want to go there. The first time someone pressed ctrl-C during a commit and leaves your repo in a half-committed state and you're picking through ,v files trying to undo the damage you'll want to kick yourself. The first time someone wants UTF-8 Content without remembering to do -kb or put UTF-8 in file names or (IIRC) tries to put a space in a file name you'll curse CVS.

OTHER TIPS

As far as I can tell, CVS doesn't do binary diffs, but will store each binary for each version. If that (disk space) is an issue, CVS is not the proper VCS for your intended use.

You may want to have a look at git-annex. It uses Git to organize information about files using commits and branches, but the actual file contents is not stored in the Git repository, which lets you control the space used by your files. It is particularly well suited to manage a collection of large files in a distributed fashion.

The git-annex assistant provides a nice interface over git-annex.

A question is a bit moot as it's not clear whether clients must have the ability to fetch arbitrary historic versions of the files. So I'm about to provide a set of options which might or might not fit your requirements; hope this will at least be able to provide some sort of hints to you…

So here we go:

  • git-annex being a solution which allows to manage a set of huge files with Git without actually keeping them on the clients.
  • Sparkleshare being a "non-proprietary Dropbox".
  • Plain rsync might be used to pull the data to your clients highly effectively.
  • rdiff-backup might be used as a variation to the former: while working generally in the way rsync does, it's able to keep arbitrary number of "deltas" representing past states of the directory being synchronized, so any such state might be extracted/rolled back to. Old deltas might be purged at will.

    This might be combined with rsync: rdiff-backup is used on the server, and the actual copy managed by it is offered to the clients via rsync. If a rollback is needed, another version is restored on the server and the clients then fetch it using rsync.

  • A bittorrent server and clients: this protocol transfers its files using chunks, so if changes to your binaries are somewhat localized, this might work just OK.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top