Question

This question is not necessarily Couchbase 2.0 develop preview specific however I think it may help people with investigation into the new Couchbase product.

I am looking for advice on data modelling. We are investigating Couchbase with a view to possibly using it for Realtime Analytics.

However I cannot find any documentation on how best to model real world data.

I shall propose a scenario and if the community could help me or discuss some ideas for how this could be modelled it would be very useful?

Note this is not representative of our product and I am not asking people to solve our modelling for us the question is more intended for discussion

Lets assume that customers make purchases of products at a particular date/time, products have information with them such as id, name, description and price, a purchase is performed on a date.

The initial requirement is to be able to count all purchases between two dates. For any 1 day there might be over 100,000 purchases - this is a pretty big business ;)

If any of the syntax is incorrect please let me know - all advice/help is welcome.

If we modelled the data something like so (which maybe completely incorrect):

Purchases with products

{
    "_id" : "purchase_1",
    "_rev" : "1-1212afdd126126128ae",
    "products" :  [
        "prod_1" : {
            "name" : "Milk",
            "desc" : "Semi-skimmed 1ltr",
            "price" : "0.89"
        },
        "prod_7568" : {
            "name" : "Crisps", 
            "desc" : "Salt and Vinegar",
            "price: "0.85"
        }
    ]
    "date" : "2012-01-14 14:24:33"
}

{
    "_id" : "purchase_2",
    "_rev" : "1-1212afdd126126128ae",
    "products" :  [
        "prod_89001" : {
            "name" : "Bread", 
            "desc" : "White thick sliced",
            "price: "1.20"
        }
    ]
    "date" : "2012-01-14 15:35:59"
}

So given that document layout we can see each purchase and we can see the products that were in that purchase - however how could we go about counting all the purchases between two dates? Also how could you see a log of all the purchases between two dates in date descending order?

Is this something Couchbase is suited for?

There might be hundreds of thousands of purchases between two dates and the customer doesn't like to wait for reports….as I'm sure everyone has experienced ;)

Would it be best to use the incr functions and if so how would you go about modelling the data?

Many thanks to anyone that reads this - I hope to expland on this further giving more examples of real world modelling problems if possible.

James

Was it helpful?

Solution

In the simplest case you could write a Map function that would create a view using the date field as a key.

So with a slightly modified document design:

{
   "_id": "purchase_1",
   "_rev": "2-c09e24efaffd446c6ee8ed6a6e2b4a22",
   "products": [
       {
           "id": "prod_3",
           "name": "Bread",
           "desc": "Whole wheat high fiber",
           "price": 2.99
       }
   ],
   "date": "2012-01-15 12:34:56"
}

{
   "_id": "purchase_2",
   "_rev": "2-3a7f4e4e5907d2163d6684f97c45a715",
   "products": [
       {
           "id": "prod_1",
           "name": "Milk",
           "desc": "Semi-skimmed 1ltr",
           "price": 0.89
       },
       {
           "id": "prod_7568",
           "name": "Crisps",
           "desc": "Salt and Vinegar",
           "price": 0.85
       }
   ],
   "date": "2012-01-14 14:24:33"
}

Your map function would look like:

function(doc) {
  for (var product in doc.products) {
    emit(doc.date, doc.products[product].price);
  }  
}

You could optionally add a reduce function that would sum up purchases by date.

function(keys, values) {
    return sum(values);
}

You could then query the view using the startkey and endkey parameters.

http://localhost:5984/couchbase/_design/Products/_view/total_price_by_date?startkey="2012-01-01"&endkey="2012-01-31"&group=true

The output from querying the view would be:

{"rows":[
{"key":"2012-01-14 14:24:33","value":4.94},
{"key":"2012-01-15 12:34:56","value":2.99}
]}

Or remove the group parameter to get the sum for the entire date range:

{"rows":[
{"key":null,"value":7.930000000000001}
]}

Hope that helps.

-- John

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top