Question

We have a couple of production couchdb databases that have blown out to 30GB and need to be compacted. These are used by a 24/7 operations website and are replicated with another server using continuous replication.

From tests I've done it'll take about 3 mins to compact these databases.

Is it safe to compact one side of the replication while the production site and replication are still running?

Was it helpful?

Solution

Yes, this is perfectly safe.

Compaction works by constructing the new compacted state in memory, then writing that new state to a new database file and updating pointers. This is because CouchDB has a very firm rule that the internals of the database file never gets updated, only appended to with an fsync. This is why you can rudely kill CouchDB's processes and it doesn't have to recover or rebuild the database like you would in other solutions.

This means that you need extra disk space available to re-write the file. So, trying to compact a CouchDB database to prevent full disk warnings is usually a non-starter.

Also, replication uses the internal representation of sequence trees (b+trees). The replicator is not streaming the entire database file from disk onto the network pipe.

Lastly, there will of course be an increase in system resource utilization. However, your tests should have shown you roughly how much this costs on your system vs an idle CouchDB, which you can use to determine how closely you're pushing your system to the breaking point.

OTHER TIPS

I have been working with CouchDB since a while; replicating databases and writing Views to fetch data.

I have seen its replication behavior and observed this, which can answer your question:

  1. In the replication process previous revisions of the documents are not replicated to the destination, only current revision is replicated.
  2. Compacting database only removes the previous revisions. So it will not cause any problem.
  3. Compaction will be done on the database on which you are currently logged in. So it should not affect its replica which is continuously listening for changes in it. Because it listens for the current revision changes not the previous revisions. To verify it you can see this:

Firing this query will show you changes of all the sequences of database. It only works on the basis of latest revision changes not the previous ones(So I think compaction will not make any harm):

curl -X GET $HOST/db/_changes

The result is simple:

{"results":[

],
"last_seq":0}

More info can be found here: CouchDB Replication Basics

This might help you to understand it. In short answer of your question is YES, It is safe to compact database in continuous replication.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top