How can I speed up SVN updates?

https://stackoverflow.com/questions/65135

09-06-2019
|

Question

We have a rather large SVN repository. Doing SVN updates are taking longer and longer the more we add code. We added svn:externals to folders that were repeated in some projects like the FCKeditor on various websites. This helped, but not that much.

What is the best way to reduce update time and boost SVN speed?

Solution

If it's an older SVN repository (or even quite new, but wasn't setup optimally), it maybe using the older BDB style of repository database. http://svn.apache.org/repos/asf/subversion/trunk/notes/fsfs has notes on the new one. To change from one to another isn;t too hard - dump the entire history, re-initialise it with the new svn format of file system and re-import. It may also be useful at the same time to filter the repo-dump to remove entire checkins of useless information (I, for example, have removed 20MB+ tarball files that someone had checked in).

As far as general speed goes - a quality (speedy) hard-drive and extra memory for OS-based caching would be hard to fault in terms of increasing the speed of how SVN will work.

On the client side, if you have tortoisesvn setup through PuttyAgent for SSH access to an external repository machine, you can also enable SSH compression, which can also help.

Edit: SVN v1.5 also has the fsfs-reshard.py tool which can help split a FSFS based svn repository into a number of directories - which can themselves be linked onto different drive spindles. If you have thousands of revisions, that can also help - if for no other reason than finding one file among thousands takes time (and you tell tell if thats a problem by looking at the IOwait times)

OTHER TIPS

Disable virus checking on folders that contain working copy code. This caused my updates to become twice as fast.

Not really an answer, but it may be interesting to know that one of the reasons svn is so I/O-heavy is the fact that it stores one extra copy of each file in the .svn/text-base directory. This makes local diff operations fast, but eats lot's of harddisk space and I/O.

http://subversion.tigris.org/issues/show_bug.cgi?id=525 has the details.

Sounds like you've got multiple projects in one repository. Splitting them up where appropriate will give you a big boost.

Supposedly Git is much faster than Subversion due to the way it stores/processes changes, but I have no first-hand experience with it.

Make sure your connection to the server is a fast as can be (gigabit ethernet). Make sure the server has fast disks in an array. And, of course, only check out what you need.

There are some common performance tweaks. SVN is very I/O heavy, so faster hard disks are an option (on both ends). Add more memory to your server. Make sure your clients have a defragmented hard disk (for Windows).

What access method you use also matters. Repositories stored on remote filesystems (using file:/// access) are going to be much slower than either svnserve or Apache with mod_svn. Consider using one of the latter if you have the repository on a simple file share.

TotoiseSVN by default looks at file changes in the background and I have seen that slow down my machine. I changed the config to exclude everything and then only include the directories where I have checkouts. You can also turn off the background checks. Both of these settings are in the Icon Overlays settings node.

Sometimes slow svn operation, especially with many externals, is DNS-related. It looks like svn performs DNS lookup per every svn:external, even for relative ones. Adding your svn server hostname to /etc/hosts or fixing resolv.conf can be useful.

I've found in my own experience (ie: not through any actual tests) that, especially if the SVN repo server is remote, using externals seems to slow things down. If you've got duplicated code (like your FCK editor) in multiple places, I would tend to stick to using externals since keeping those files synchronised and manageable is more important than update speeds - though, you could look at using symbolic links to bring in duplicated code instead. (If you're using Windows XP, you can use junction points).

We've split our code base into several sibling modules and wrote the Ant scripts so that one developer can work on one module at a time without bothering too much about what's happening in the other modules.

a top-level build script triggers all modules build scripts
external libraries are not stored in Subversion but rather pulled from a network drive using Apache Ivy. (think of it like an in-house Maven repository).
dependencies between modules are also managed using Ivy.

Typically, developers will need to update their entire tree a couple times a week but it can easily be done before going to lunch/coffee break.

Using read-access rights (i.e. restricting read access to certain persons/groups) will slow down the repository a lot. Especially when the authentication is done in some special way, e.g. against a windows domain. The same holds true for write access rights, of course, but writing is less frequent then reading. And restricting write access can be more important than restricting read access

If you have many folders in the root of repository and your local copy reflects the repository, then try to slit monolithic local copy into many separated downloadable folders and update these folders separately too, It will be really faster than one big folder.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow