Question

When working with very large documents, would it basically overwhelm the connection and ground to a halt or successfully manage using diffs?

Was it helpful?

Solution

In short: diff

Each time someone hits a key in an etherpad document, all connected participants get a short message (some 100 byte + some kilobyte or so of HTTP headers and stuff)

Bandwidth will not be the first bottleneck, so don't worry about saturating ("filling up") your bandwidth with an etherpad lite server. The underlying framework (node.js) on its own perhaps could (static files etc), but the etherpad lite code surely will be limited by CPU speed and possibly disk space. (The classical ehterpad can generate GB of disk log files per day and I don't know if "lite" is defaulting to more limited logging, but you can of course change that or simply delete old log files)

I've poked around the old/original etherpad and etherpad lite uses the same methods for handling text documents. No document is stored "in full" but always as a set of changes. The changes are run "play back" to recreate the document. To avoid playing back thousands of tiny changes, there are aggregate changes stored in the database (so you playback changes in log10 time scale).

OTHER TIPS

I would ask this question to the author(petermartischka - googlemail - com?) instead, maybe posting answer here?

You should look at this: http://en.wikipedia.org/wiki/Operational_transformation.

While I dont know about Etherpad, http://codecollab.gamooga.com/ and http://collabedit.com/ use this. Google Docs uses a variant of this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top