SVN performance after many revisions

https://stackoverflow.com/questions/127692

02-07-2019
|

Question

My project is currently using a svn repository which gains several hundred new revisions per day. The repository resides on a Win2k3-server and is served through Apache/mod_dav_svn.

I now fear that over time the performance will degrade due to too many revisions.
Is this fear reasonable?
We are already planning to upgrade to 1.5, so having thousands of files in one directory will not be a problem in the long term.

Subversion on stores the delta (differences), between 2 revisions, so this helps saving a LOT of space, specially if you only commit code (text) and no binaries (images and docs).

Does that mean that in order to check out the revision 10 of the file foo.baz, svn will take revision 1 and then apply the deltas 2-10?

Solution

What type of repo do you have? FSFS or BDB?

(Let's assume FSFS for now, since that's the default.)

In the case of FSFS, each revision is stored as a diff against the previous. So, you would think that yes, after many revisions, it would be very slow.

However, this isn't the case. FSFS uses what are called "skip deltas" to avoid having to do too many lookups on previous revs.

(So, if you are using an FSFS repo, Brad Wilson's answer is wrong.)

In the case of a BDB repo, the HEAD (latest) revision is full-text, but the earlier revisions are built as a series of diffs against the head. This means the previous revs have to be re-calculated after each commit.

For more info: http://svn.apache.org/repos/asf/subversion/trunk/notes/skip-deltas

P.S. Our repo is about 20GB, with about 35,000 revisions, and we have not noticed any performance degradation.

OTHER TIPS

Subversion stores the most current version as full text, with backward-looking diffs. This means that updates to head are always fast, and what you incrementally pay for is looking farther and farther back in history.

I personally haven't dealt with Subversion repositories with codebases bigger than 80K LOC for the actual project. The biggest repository I've actually had was about 1.2 gigs, but this included all of the libraries and utilities that the project uses.

I don't think the day to day usage will be affected that much, but anything that needs to look through the different revisions might slow down a tad. It may not even be noticeable.

Now, from a sys admin point of view, there are a few things that can help you minimize performance bottlenecks. Since Subversion is mostly a file-based system, you can do this:

Put the actual repositories in a different drive
Make sure that no file locking apps, other than svn, are working on the drive above
Make the drives at least 7,500 RPM. You could try getting 10,000 RPM, but it may be overkill
Update the LAN to gigabit, if everybody is in the same office.

This may be overkill for your situation, but that's what I've usually done for other file-intensive applications.

If you ever "outgrow" Subversion, then Perforce will be your next step up. It's hands down the fastest source control app for very large projects.

We're running a subversion server with gigabytes worth of code and binaries, and it's up to over twenty thousand revisions. No slowdowns yet.

Subversion only stores the delta (differences), between 2 revisions, so this helps saving a LOT of space, specially if you only commit code (text) and no binaries (images and docs).

Additionally I´ve seen a lot of very big projects using svn and never complained about performance.

Maybe you are worried about checkout times? then I guess this would really be a networking problem.

Oh, and I´ve worked on CVS repositories with 2Gb+ of stuff (code, imgs, docs) and never had an performance problem. Since svn is a great improvement on cvs I don´t think you should worry about.

Hope it helps easy your mind a little ;)

I do not think that our subversion slowed down by aging. We have currently several TeraBytes of data, mostly binary. We checkout/commit daily up to 50 GigaByte of data. In total we have currently 50000 revisions. We are using FSFS as storage type and are interfacing either directly SVN: (Windows server) or via Apache mod_dav_svn (Gentoo Linux Server).

I cannot confirm that this gets svn to slowdown over time, as we set up a clean server for performance comparison which we could compare to. We could NOT measure a significant degration.

However I have to say that our subversion is uncommonly slow by default and obviously it is subversion itself as we tried with another computer system.

For some unknown reasons subversion seems to be completly server CPU limited. Our checkout/commit rates are limited to in between 15-30 MegaBytes/s per client because then one server CPU core is completly used up. This is the same for an almost empty repository (1 GigaByte, 5 revisions) as for our full server (~5 TeraByte, 50000 revisions). Tuning like setting compression to 0 = off did not improve this.

Our High Bandwith (delivers ~1 GigaByte/s) FC-Array idles, the other cores idle and network (currently 1 GigaBit/s for clients, 10 GigaBits/s for server) idles as well. Okay not really idling but if only 2-3% of available capacity is used I call it idling.

It is no real fun to see all components idling and we need to wait for our working copies to get checked out or comitted. Basically I have no idea what the server process is doing by fully consuming one CPU core all the time during checkout/commit.

However I am just trying to find a way to tune subversion. If this is not possible we might need to switch to another system.

Therefore: Answer: No SVN does not degrade in performance it is initially slow.

Of course if you do not need (high) performance you won't have a problem. Btw. all the above applies to subversioon 1.7 latest stable version

The only operations which are likely to slow down are things which read information from multiple revisions (e.g. SVN Blame).

I am not sure..... I am using SVN with apache on Centos 5.2. Works ok. Revision number was 8230 something like that... And on all client machines Commit was so slow that we had to wait at least 2min for a file that is 1kb. I am talking about 1 file that has no big filesize.

Then I made a new repository. Started from rev. 1. Now works ok. Fast. used svnadmin create xxxxxx. did not check if it is FSFS or BDB.....

Maybe you should consider improving your workflow.

I don't know if a repos will have perf issues in these conditions, but you ability to go back to a sane revision will.

In your case, you may want to include a validation process, so a team commit in a team leader repo, and each of them commit to the team manager repo who commit to the read-only clean company repos. You have make a clean selection at it stage of what commit must go to the top.

This way, anybody can go back to a clean copy, with an easy to browse history. Merge are much easier, and dev can still commit their mess as much as they want.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow