Question

I am in charge of about 100+ documents (word document, not source code) that needs revision by different people in my department. Currently all the documents are in a shared folder where they will retrieve, revise and save back into the folder.

What I am doing now is looking up the "date modified" in the shared folder, opened up recent modified documents and use the "Track Change" function in MS Word to apply the changes. I find this a bit tedious.

So will it be better and easier if I commit this in a version control database?

Basically I want to keep different version of a file.


What have I learn from answers:

  • Use Time Machine to save different version (or Shadow copy in Vista)

  • There is a difference between text and binary documents when you use version control app. (I didn't know that)

  • Diff won't work on binary files

  • A notification system (ie email) for revision is great

  • Google Docs revision feature.

Update :

I played around with Google Docs revision feature and feel that it is almost right for me. Just a bit annoyed with the too frequent versioning (autosaving).

But what feels right for me doesn't mean it feels right for my dept. Will they be okay with saving all these documents with Google?

Was it helpful?

Solution

I guess one thing that nobody seems to have asked is if you have a legal requirement to store history of changes to the doc's?

Whether you do or don't is going to have an impact on what solutions you can consider.

Also a notification mechanism for out of date copies is also a bundle of fun. If engineer A has a copy of a document and engineer B then edits it and commits the changes you want engineer A to be notified that his copy is out of date.

Document control can become a real can of worms quite easily.

Maybe keep the doc's under CVS or SVN and set it up so that emails are generated to whoever has checked out a copy when updates for the same doc. are checked in to the repository?

Edit: I forgot to add don't forget to use the binary switch, e.g. -kb for CVS, when adding the new doc. Otherwise, you will get any sequences of data that happen to match the ascii for keyword strings having the relevant config management data appended thereby corrupting your doc. data.

OTHER TIPS

I've worked with Word documents in SVN. With TortoiseSVN, you can easily diff Word documents (between working copy and repository, or between two repository revisions). It's really slick and definitely recommended.

The other thing to do if you're using Word documents in SVN is to add the svn:needs-lock property to the Word documents. This will prevent two people from trying to edit the same document at the same time, since unfortunately there's no good way to merge Word documents.

With the above two things, handling revision controlled Word documents is at least tolerable. It certainly beats the alternative of using a shared folder and track-changes.

What on Earth are you all Word-is-binary-so-no-diff people talking about? TortoiseSVN, for example, integrates right out of the box with Word and enables you to use Word's built-in diff and merge functionality. It works just fine.

I have worked on projects that store documents in version control. It has worked out pretty well, although if people are unfamiliar with version control, they are probably going to have conceptual difficulties with things like "working copy" and "merge" and "conflict". Don't overestimate the users' capabilities when you plan your document management system.

I believe there exist big and powerful commercial solutions for all of this, as well. I'm sure if you have enough kilodollars, you can get something that fits your needs perfectly. Document management systems are a big business for big enterprise.

Thinking out of the box, would migrating to a Wiki be out of the question?

Since you consider it feasible to force your users into Subversion (or something similar), a larger change seem acceptable.

Another migration target could be to use some kind of structured XML document format (DocBook comes to mind). This would enable you to indeed use diffs and source control, while getting all sorts of document formats for free.

Sharepoint also does a good (ok decent) job of versioning MS-specific documents.

How about trying git , It seems git can support word .doc and open document .odf files if you configure it in .gitattributes file.

Here is a reference , Scroll down to diffing binary files .

For what it's worth, there is also Google Docs. I guess it's not a perfect fit, but it's versioning is very convenient.

Clearcase integrates with Word for revision tracking. I believe Telelogic DOORs does as well.

I use Mercurial with the TortoiseHg overlay. I can right-click a changeset, choose "Visual Diff", then choose the "docdiff" tool (comes bundled), which launches the document in Word with the Track Changes.

You can, but you will allways compare the document versions with Word itself.

I haven't heard a version control database which can track changes in Word documents.

However there are some tools which can compare Word documents, so if you set up your version control client to use these tools for comparison, you can have some fun.

Not necessarily. It depends on how often the new files are committed to the repo. If the files are edited several times before a commit, then you're precisely where you are now. The biggest benefit is if the file becomes corrupted.

You can version any file; this is how Time Machine in Mac OS X Leopard works, for example, and there is an interesting article by someone who committed his entire computing environment into CVS and then just maintained working copies on his home and work machines.

But "better" and "easier" are specific to your situation, and I'm not sure I completely understand your problem as things stand.

Subversion, CVS and all other source control systems are not good for Word documents and other office files (such as Excel spread sheets), since the files themselves are stored in a binary format. That means that you can never go back and annotate (or blame, or whatever you want to call it), or do diffs between documents.

There are revision control systems for Word documents out there, unfortunately I do not know any good ones. We use such control systems for Excel at my work, and unfortunately they all cost money.

The good thing is that they make life a lot easier, especially if you ever have to do an audit or due diligence.

If you use WinMerge it has added support for merging Word and Excel binary files.

Have a look at Sharepoint. If cost is an issue, Sharepoint portal sevices can also work for you. Read this for more info

You could use something like the Revisionator, which is like google docs but with built in revision control including diffs, forks, and 3 way merges. http://revisionator.com

UPDATE: It also fixes the problem of too frequent autosaving that you mention with Google Docs. It'll still autosave to prevent data loss, but it will only create a new version in the revision history and share with other users when you explicitly "release" your changes.

Just wanted to clarify an answer someone gave but I don't have enough points yet.

diff will work on binary files but it is only going to say something not really useful like "toto1 and toto2 binary files differ".

You could do that, but if that files are binary you should always put a lock on it before editing. You won't get a conflict (which would be unresolvable).

Many of the new version control projects are better suited to entire directories, and not so much for single files.

Convincing someone that they need to get an entire project, when they only want to update an individual file can be a "fun" way to spend an afternoon.

Another option you have is a piece of software and cloud computing magic called dropbox. Or, you could ditch the word documents and make a locally shared mediawiki instead.

DropBox: getdropbox DOT com

MediaWiki: mediawiki DOT org

YES, it's applicable! I totally agree to say that the combo SVN+TortoiseSVN suits well to track MS Office documents. You can lock a document for edition, write protect all unlocked files to avoid conflicts (i.e. parallel modifications), diff two versions of the same file, see the history of all the modifications and of course rollback to an older revision.
I tried to describe all of those tips in a dedicated blog post. (disclaimer: I'm the blog owner)

All of this could even be accessible from the web with a SVN web client! (might need some software development)

But if you're not accustomed to Version Control Systems in an other context this may not be the obvious choice. The needed work for a good integration with docs give dedicated tools an advantage: "electronic document management" systems are made just for that. A VCS like SVN may stay a good alternative for cost reasons :-)

Did you test the online service Simul? It looks promising, I personally like the GitHub-like orientation. Note that I'm not affiliated to Simul!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top