Using Subversion for general purpose backup

https://stackoverflow.com/questions/61888

svn
backup

09-06-2019
|

Question

Is it possible to use Apache Subversion (SVN) as general purpose backup tool? (As a kind of rsync alternative.)

Solution

I found this article to be a pretty cool description of using svn to backup your home directory, and more:

I use Subversion to backup my Linux boxes. With some minor creativity, it easily covers:

Daily snapshots and offsite backup.

Easy addition and removal of files and folders.

Detailed tracking of file versions.

It also allows for a few bonus features:

Regular log emails to keep track of filesystem activity via Subversion's event hooks.

Users may request a checkout of their home folders from any respository revision.

New or replacement servers can be setup with a few svn checkout commands.

Source: http://www.mythago.net/svn_for_backup.html

Also found this article which shows an example of versioning your home directory. This allows you to bring your environment with you by checking out your home directory into a new machine. I used to do something similar and found it very useful.

OTHER TIPS

One thing to bear in mind when using SVN as a backup for binary files is that SVN will double the size of your files, because it keeps a local copy of each file (in the .svn/text-base) file.

Apart from that I use SVN for a backup as well. Simply add all files then commit via script.

As a "general purpose" backup, I'd say it's probably not the greatest idea, mainly for the reasons given by others (lots of excess folders and wasted disk space). If you want to just keep backups, again I'd say there's probably better options, depending on your needs, eg: do you need to keep every single version of every single file, or would certain snapshots of your data be sufficient?

However, at my office, we have a small team of 6 who work with shared files (eg: policies and procedures manuals, registration forms, etc). A lot of the time, team members will be working remotely (from home or while travelling), and often offline. Rather than using a central shared-folder setup, we use SVN to give each person an entire working copy of the folder which they can work on and refer to and synchronise whenever possible. This kills two birds with one stone: everyone can access and edit the files even while offline, plus it gives us really great redundancy in our backups. If my laptop catches on fire, it's no hassle because I can just check out another copy (obviously on another computer). If the server catches on fire, we'll have the backups of the repository to restore. If the server AND all the repo backups catch on fire, then all that you've lost are old versions of files. The only way that you'll lose any current data is if the server, your repo backups and every single computer which has a checkout all mysteriously catch on fire.

As some people have said though, SVN will never remove information from the repository, meaning that if you only want to keep backups for 60 days, then, well, you can't. This isn't exactly true. Through use of export, dump and import you could effectively wipe out older versions of files. It's not pretty, but it's possible.

One thing, that would annoy me a lot, are the '.svn' folders, that svn puts into every folder it tracks.

They look annoying, when you copy a folder, you should remember to not copy them (or your sandbox might be irritated) and it is a lot harder to grep through a bunch of folders, since there are often a lot of hits in the .svn resource folders.

I like the idea of using a source-control, to control your environment. But I personally would not choose svn for this job. I would go for something like git. But that is probably just me...

I do use SVN to backup my computer, and also to synchronize my laptop and my desktop. But it does have the problems mentioned in earlier answers, mainly the doubling of the disk usage. I also feel that the excess of files and the SVN process constantly checking my HD for changes makes my machine slower.

I would like to highlight, however, that SVN is great for synching different machines, and you also get the bonus of being able to check out a file anywhere if you need to -- I even do it in my browser through the web interface, sometimes.

In summary, I have mixed feelings about using SVN for general purpose backup. But if you do, I recommend not to store libraries such as movies, photos and music, because they tend to be large (suffering hugely from the doubled space usage) and immutable -- you don't need a versioning system for that, because in the rare occasions when you change a file, you generally don't need the old versions (and SVN isn't good at making/storing diffs of binary files, it saves the entire new version of the file). So, unless SVN can be adapted (a long-time project intention of mine) for these cases, I suggest using an alternative method for backing up these kinds of files.

You could also consider bup - Highly efficient file backup system based on the git packfile format. It's based on git's in the way it stores data, which is very efficient for storing files and their differences.

I've used CVS as a substitute for ghost so I don't see why not.

I'ts nice as you can tag a baseline: you can change manage machines.

This works better on unixes than windows, obviously.

The thing that would put me off that idea, is that for general use any binary data would get copied over anytime it changed, whereas the text content SCM systems are based around can easily be updated in the form of diffs.

So you could do it, just be aware you may not want to use it to manage things like photo repositories if you do much editing.

The nice thing about more general purpose backup solutions (say, Time Machine) is that they can roll up multiple binary changes after a while to conserve space. I'm not sure how easy that would be to do in SVN or git or mercurial.

Using SVN for backups can work. However, over time it can be difficult to delete old revisions that are not needed. Say you only wanted to keep 30 or 60 days of backups. SVN does not provide an easy way to remove any history older than X days. If you don't have a way to purge old history you will eventually run your backup drive out of space.

Here is a quote from the SVN Book on the svndumpfilter command:

Since Subversion stores everything in an opaque database system, attempting manual tweaks is unwise, if not quite difficult. And once data has been stored in your repository, Subversion generally doesn't provide an easy way to remove that data. [13]

[13] That, by the way, is a feature, not a bug.

I found unison to be a better option than svn for a rsync alternative.

This statement by JoaoPSF is incorrect:

(and SVN isn't good at making/storing diffs of binary files, it saves the entire new version of the file)

See this quote from How does Subversion handle binary files:

Note that whether or not a file is binary does not affect the amount of repository space used to store changes to that file, nor does it affect the amount of traffic between client and server. For storage and transmission purposes, Subversion uses a diffing method that works equally well on binary and text files; this is completely unrelated to the diffing method used by the svn diff command.

Backing up /etc with source code control can be a big help when you want to revert a change that hosed your system, experiment with changes, or carry changes from one server to another.

But subversion's multitude of .svn directories can get in the way for that, not just when searching but in some cases, like *.d folders, poorly designed systems might interpret the .svn folders themselves as containing configuration data.

I now prefer using Mercurial for backing up /etc since it puts a single .hg folder under /etc. For real backup and not just version control you need to copy that .hg folder elsewhere.

To use SVN as backup on Linux do the following:

Create an empty repo.
Checkout the empty repository into the folder tree you want to backup.
Use the following code snippet (svnauto). You have to replace "myuser" and "mypassword" with valid credentials for your repository:

    #!/bin/sh
    svn status --depth=infinity --username=myuser --password=mypassword > /tmp/svnauto_tmp.list
    cat /tmp/svnauto_tmp.list | grep '^?' | sed -e 's/^?       /svn add --depth=infinity --force --username=myuser --password=mypassword "/g' -e 's/$/@"/g' | sh
    cat /tmp/svnauto_tmp.list | grep '^!' | sed -e 's/^!       /svn delete --username=myuser --password=mypassword "/g' -e 's/$/@"/g' | sh
    rm -f /tmp/svnauto_tmp.list
    svn update . --username=myuser --password=mypassword
    svn commit --username=myuser --password=mypassword --message "Automatic backup"

The script above will add/remove and update any files and subdirectories within the current dir. To use it simply cd to the folder you want to backup (which must be a working copy of course), and run svnauto. Notice that you need to have grep and sed installed on your system, and it creates a temporary file in /tmp. It can be used from a cron job for nightly commit, using the following cron script:

#!/bin/sh
export LANG=en_US.UTF-8 && cd /my/directory && echo Starting backup $(date) > /root/backup_log.txt && /root/svnauto >> /root/backup_log.txt 2>&1 && echo Finished backup. >> /root/backup_log.txt && cat /root/backup_log.txt

This cron script assumes that /my/directory is the folder you want to backup (replace as needed). It also assumes you put the svnauto script in /root. It creates a log and displays it at the end. One more detail: the first export is needed for svn to find the proper language. You may have to adjust this line to your own local language to make it work.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow