Backing up a RIAK database data

Question

I'll answer your immediate question (how to use riak-admin backup) first, but see the comments on preferred methods of backing up, at the end.

The command is:

riak-admin backup <node name> <erlang cookie> <file name with path> all

The node name you can find in your riak vm.args file (look for the line that looks like -name riak@127.0.0.1). It'll be of the form riak@xx.xx.xx.xx with the IP address. So, on my local machine, a single node is named riak@127.0.0.1.

The erlang cookie is also found in the vm.args file, it will most likely be erlang.

The file name parameter should be a fully-qualified path to the actual file name (meaning, you can't give it just a directory name). The filename and extension are arbitrary. So, I would use something like cluster_backup.riak. So, to put it all together, your backup command should look like:

riak-admin backup riak@<your node ip> riak /var/local/temp/cluster_backup.riak all

Now, having said all that, I don't recommend using the riak-admin backup and restore commands to back up your whole cluster. For several reasons. One, it stores every replica of every object. Meaning, if you're running with the default replica value of n=3, you will be storing 3 copies of each object in your backup file. Two, the code invoked by that command is single-threaded, and not connection pooled. So all in all, it's going to be SLOW to restore and backup.

Instead, I recommend one of the following approaches:

Take filesystem level snapshots of the data directories of each node. This is the approach currently recommended by Basho, and detailed here: http://docs.basho.com/riak/latest/ops/running/backups/
If you definitely want a "logical" backup (meaning, an export of the objects contained in the cluster), you can use an experimental standalone tool such as the Riak Data Migrator (but see the limitations in the Readme).

I recommend testing out / timing each of these approaches, to see which one is faster for your situation.