Question

i'm researching information regarding subversions internal data structure for a term paper. I already found http://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure but can't find some other information.

In a new Repository i commited a simple Textfile containing a few test lines.

DELTA
SVN▒▒   ▒▒▒Dies ist die erste Datei im SVN-Repository. Ebenfalls ist dies die erste Zeile der Ersten Datei.
Dies ist die zweite Zeile der ersten Datei.
//leerzeile
Diese Zeile wird in der nächsten Revision gelöscht werden.
//leerzeile
Dies ist die letzte Zeile der Datei.
ENDREP
id: 0-1.0.r1/293
type: file
count: 0
text: 1 0 280 263 f76e56eefcb558ac6682682c05c16eb8 785a12924cf4a78d97cb10ba9903086bf3683d2d 0-0/_2
cpath: /Erste Datei.txt
copyroot: 0 /

PLAIN
K 15
Erste Datei.txt
V 17
file 0-1.0.r1/293
END
ENDREP
id: 0.0.r1/529
type: dir
pred: 0.0.r0/17
count: 1
text: 1 468 48 48 6d3f404edb0eca280b1f748b565436c9
cpath: /
copyroot: 0 /

_0.0.t0-0 add-file true false /Erste Datei.txt


529 654

The first block (beginning with DELTA ending with an empty line) is explained in the structure file, but the part below it is not.

Is anyone here able to explain it to me? Of course i see that the Line after "K 15" is the filename and the line after "V17" is what is described in the Delta above, but where is that information used?

Thanks for your help already

Was it helpful?

Solution

That is the directory representation for the root of your repository. Think of directories in Subversion as being just files where the contents of the file are a hash dump (as described in the page you linked) of the directory.

If the representation is for the text contents of a directory node, the expanded contents are in hash dump format mapping entry names to " " pairs, where is "file" or "dir" and gives the ID of the child node-rev.

Until recently the directory representation was always written as a PLAIN (i.e. full text) but as of 1.8.0 it can be written as a DELTA (which greatly reduced the storage space required when repositories had a very deep tree).

The reason why we have directory representations is because we have abstracted the storage of an individual file away from its location in the tree. First this was used for implementing cheap copies. When you branch (via the copy command) Subversion doesn't write out a new file content representation for the files in the tree, but rather simply writes out new directory representations for that point to the existing file representations. This was further used in representation sharing (which uses a database to avoid storing the same content that is independently added or created via merges).

You may also want to read about the directory bubble up method that's used for storage which is described in the Subversion Design Document. Note that this document is terribly old and not entirely up to date. But the bubble up information is still accurate and informational.

I'd point you at Stefan Fuhrmann's talk from Subversion & Git Live 2013, but I don't think it's been posted to the web yet. But it would have some tidbits about the work that's being done on the file system format that you might find interesting.

Feel free to swing by #svn-dev on irc.freenode.net if you have further questions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top