I am studing couchbase, can anyone exlain what exactly is bucket and vbucket?

Question 1

Bucket is like database at RDBMS. It contains documents, views and some configurations. VBucket is like shard at RDBMS. All keys at CB mapped to #VBucket and #VBucket mapped to server-name. Thanks to these hash functions results in an even distribution of documents on multiple nodes and fast get operation of the document by its id.

Question 2

Short answer

Bucket is a logical keyspace of uniquely keyed documents, evenly distributed across all nodes in a cluster.

vBucket is a subset of a bucket which is located on a single node. Union of all vBuckets is a bucket.

Slightly longer answer

Imagine you have three nodes:

+----------+         +----------+        +----------+
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
|          |         |          |        |          |
+----------+         +----------+        +----------+
   node1                node2               node3

A bucket is a set of documents (that can be different in structure and attributes) that is distributed over all three nodes but it shares the same key space.

   +----------+         +----------+        +----------+
+---------------------------------------------------------------+
|  |          |         |          |        |          |        |
|  |          |         |          |        |          |      Bucket
|  |          |         |          |        |          |        |
+---------------------------------------------------------------+
   |          |         |          |        |          |
   |          |         |          |        |          |
   +----------+         +----------+        +----------+
      node1                node2               node3

Note that a key must be unique within a bucket, which is kind of different compared to a database concept in RDBMS where a key is unique within a table.

The bucket is divided into 1024 segments which are evenly distributed across all the nodes in the cluster. These segments are virtual buckets, or vBucketes. So, in this case, on each node there are 1024/3 vBuckets.

   +----------+         +----------+        +----------+
+---------------------------------------------------------------+
|  |          |         |          |        |          |        |
|  |  341 vBs |         |  341 vBs |        |  342 vBs |      Bucket
|  |          |         |          |        |          |        |
+---------------------------------------------------------------+
   |          |         |          |        |          |
   |          |         |          |        |          |
   +----------+         +----------+        +----------+
      node1                node2               node3

Each vBucket has its associated set of documents. So when the lookup is performed, clusterMap calculates the hash of the searched document's key and identifies the node and the vBucket where the document is located.

references: http://training.couchbase.com/online

Question 3

You can start with Couchbase documentation, section "Architecture and Concepts" http://docs.couchbase.com/admin/admin/Concepts/concept-intro.html

For more information about buckets, see http://docs.couchbase.com/admin/admin/Concepts/concept-dataStorage.html.

For more information about vBuckets, see http://docs.couchbase.com/admin/admin/Concepts/concept-vBucket.html.

In short, bucket is an abstraction, which describes certain resources on the cluster (like RAM and disk space) and also from the API standpoint it is namespace for the documents stored in the system, similar to database in SQL world.

Question 4

In addition to above answers, I have one more answer where I like to share deeper reason for the presence of vBuckets. If you are coming from RDBMS world, consider 'bucket' as a 'table' and 'documents' as its 'records'. Since KV documents may or may not have fixed number of KV pairs in each documents, we call it 'schema-less'.

Coming to 'vBuckets', you may consider it something similar to DB blocks.

In a Bucket

If replicas are not enabled, then a Bucket has 1024 active vBuckets.
If replicas is set to '1' then a Bucket has 1024 active vBuckets + 1024 replica vBuckets.
Similarly, if replicas is set to '2', then a Bucket has 1024 vBuckets + (1024 x 2) replica vBuckets.

This ratio of '1 Bucket: 1024 vBucket' is fixed. We cannot change that. The idea is to evenly distribute the data across the nodes of the CB Cluster

Example: When we create a 'Bucket' in a 3-node CB Cluster, its 1024 vBuckets will be evenly spread across those 3 nodes. Thus, if we have replica vBuckets as well, that too will be spread evenly. However, Couchbase Server will make sure that for the Active vBuckets present on node 1, its replica vBuckets are on other 2 nodes of that CB Cluster. Same applies for node 2 and node 3. Hope you get this part...It is to make sure that node failure will not cause the data loss. In case if 1 node of the 3 node cluster fails, then replica vBuckets of the surviving 2 nodes will automatically be promote to active vBuckets.