Question

How can I save changes in CouchDB / Cloudant in order to later do point-in-time restores of my databases, or even specific documents?

Was it helpful?

Solution

We’re working on making this a first-class feature, but until we roll it out, this is how one of our customers did it:

You have collections, and within those collections, resources. So, you keep a logging database where every document has an ID like collection-resource, so for a collection named "cars" and a resource named "Ford", you'd have a document in your logging database named cars-ford. That document looks like this:

{
  versions: [...]
}

Any time that resource is touched or modified, your application updates the logging document by appending the new version to the end of the versions field. That version might look like this:

{
  timestamp: '...', # some integer timestamp, for sorting
  doc: {...} # attributes of the document as of the save
}

We'll use that view to return a list of all versions of all documents, sorted by when each change occurred.

Then, here's how you use that to do restores and the like:

Getting the most recent version of a resource

Get the document in its entirety, and grab the last element in the versions field. That's the most recent version.

See all versions relative to a timestamp

We'll create a view to sort by timestamp. The view looks like this:

{
  map: "function(doc) {
      for(var i in doc.versions){
        emit(doc.versions[i].timestamp, doc.versions[i].doc);
      }
    }"
}

Say our database is named loggy, the design doc where our views live is named restore, and the view itself is named time. Then we'll make a GET request to this URL:

{CLOUDANT_HOST}/loggy/_design/restore/_view/time?startkey='...'

...where the value for startkey is some timestamp. This, unmodified, will return every version after the indicated timestamp. Add limit=X and you'll get the X versions after the timestamp. Add descending=true and you'll get versions before the timestamp, instead of after.

See the Nth revision for a resource

Much like above, but we'll tweak our view a little:

{
  map: "function(doc){
      for(var i in doc.versions){
        emit(i, doc.versions[i].doc);
      }
    }"
}

Now our view results are keyed by index rather than timestamp. So, instead of passing a timestamp to startkey, we just pass N to versions around the Nth revision.

Getting the number of revisions for a collection or resource

We'll use another view to group by collection and resource:

{
  map: "function(doc){
    // split te ID into collection and resource
    var parts = doc._id.split('-');
    // emit them as keys so we can group by them
    emit([doc.parts[0], doc.parts[1]], null);
  }",
  reduce: "_count"
}

Use the query parameter group and group_level to group results by their keys. So, if we want the number of events that have touched resources in the cars collection, we would use a querystring like this:

?group=true&group_level=1&key="cars"

group groups results whose keys are the same, but group_level=1 says "only group on the first key", which in our case is the collection. key specifies to only return documents whose key matches the given value.

Getting all resources for a given collection

Using the _all_docs view, we'll use a querystring like this:

?reduce=false&startkey="{collection}-"&endkey="{collection}0"

Remember the reduce part of our function? That _count value means "return the number of records emitted by map". reduce=false means "Don't do that." Instead, only the map function is run.

That startkey and endkey pair uses how Cloudant sorts results to exclude everything but the values matching IDs that start with the given collection.

Updating docs

Once you've got the versions you'd like to restore, GET the current version of the resource, GET the past version from the loggy database, and PUT the past version to the resource using the current version's _rev value. Bam, restored. Rinse and repeat for point-in-time restore.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top