Truncate a Riak Database

Question 1

I think every test-oriented developer faces this dilemma when working with Riak. As Christian has mentioned, there is no concept of rollbacks in Riak. And there is no single "truncate database" command that you can issue.

You have 3 approaches available to you:

Clear all the data on your test cluster. This essentially means issuing shell commands (assuming your test server is running on the same machine as your test suite). If you're using a Memory backend, this means issuing riak restart between each test. For other backends, you'd have to stop the node and delete the whole data directory and start it again: riak stop && rm -rf <...>/data/* && riak start. PROS: Wipes the cluster data clean between each test. CONS: This is slow (when you take into account shutdown and restart times), and issuing shell commands from your test suite is often awkward. (Sidenote: while it may be slow to do between each test, you can certainly feel free to clear the data directory before each run of the whole test suite.)
Loop through all the buckets and keys and delete them, on your test cluster, as you've suggested above. PROS: Simple to understand and implement. CONS: Also slow (to run between each test).
Have each test clean up after itself. So, if your test creates a User object, make sure to issue a DELETE command for that object at the end of the test. Optionally, test that a user doesn't exist initially, before creating one. (To make doubly sure that the previous test cleaned up). PROS: Simple to understand and implement. Fast (definitely faster than looping through all the buckets and keys between each test). CONS: Easy for developers to forget to clean up after each insert.

After having debated these approaches, I've settled on using #3 (combined, frequently, with wiping the test server data directory before each test suite run).

Some thoughts on mitigating the CONS of the 'each test cleans up after itself, manually' approach:

Use a testing framework that runs tests in random order. Many frameworks, like Ruby's Minitest, do this out of the box. This often helps catch tests that depend on other tests conveniently forgetting to clean up

Periodically examine your test cluster (via a list buckets) after the tests run, to make sure there's nothing left. In fact, you can do this programmatically at the end of each test suite (something as simple as doing a bucket list and making sure it's empty).

(This is good testing practice in general, but especially relevant with Riak) Write less tests that hit the database. Maintain strict division between Unit Tests (that test object state and behavior without hitting the db) and Integration or Functional Tests (that do hit the db). Make sure there's a lot more of the former than the latter. To put it in other words -- you don't have to test that the database works, with each unit test. Trust it (though obviously, verify, during the integration tests).

For example, if you're using Riak with Ruby on Rails, and you're testing your models, don't call test_user.save! to verify that a user instance is valid (like I once did, when first getting started). You can simply test for test_user.valid?, and understand that the call to save will work (or fail) accordingly, during actual use. Consider using Mockist-style testing, which verifies whether or not a save! function was actually invoked, instead of actually saving to the db and then reading back. And so on.

Question 2

There are few possible answers here.

Are you testing that data is persisted by querying Riak using its key? If so, you can set up a test server. Documentation, such as it is, is here, http://rubydoc.info/github/basho/riak-ruby-client/Riak/TestServer
Are you testing access by secondary index? If so, why? Do you not trust Riak or the Ruby driver?
In all probability, your tests shouldn't be coupled to the data store in any case. It slows down things.
If you do insist and the TestServer isn't working for you, set up a new bucket for every test run. Each bucket is its own namespace, so it's pretty much clean slate. Periodically, stop the nodes and clear out data directories as per Christian's answer above.

Question 3

As there is no concept of transactions or rollbacks in Riak, that is not possible. The memory backend is however commonly used for testing as it supports the features of Bitcask (auto-expiry) and LevelDB (secondary indexes). Whenever the database needs to be cleared, the nodes just need to be restarted.

If using Bitcask or LevelDB when testing, the most efficient method to clear the database is to shut down the node and simply remove the data directories.