Is CouchDB or CouchBase suitable as a persistence NoSQL-based solution for storing users chat history and statistics? Since chat history would probably require writes rather than reads what should be the document structure for a single user history with some statistics - single entity representing user with embedded or separated documents for history data (lots of small docs) and some stats (small number of docs)?

有帮助吗?

解决方案

Yes, CouchDB or Couchbase is suitable.

Since chat history requires many writes, I am thinking of something that makes writing easy: just drop a document and let CouchDB worry about aggregating it. In one quick POST you could describe the chat message, who sent it, timestamp, which chat room, etc.

CouchDB view collation will make the single entity representing a user with their historical data. For example, if you want to know user message volume, your map function will emit a key like this:

emit([doc.username, doc.year, doc.month, doc.day, doc.hour, doc.minute], 1);

And the reduce function adds up all the values. Now you can query a user's annual volume,

group_level=3&startkey=["somebody",2011,null]&endkey=["somebody",2011,{}]

or (by increasing the group level) monthly volume, daily volume, hourly volume, etc.

Considerations

This technique has costs and benefits. The basic trade-off is, updates should be easy, reports should be reasonable. In your example of 10,000 updates per day, I get nervous thinking about 409 Conflict rejections, or maintaining conflict-resolution code, or making the client gracefully recover from an error when more messages are piling up!

The suggested technique helps. Each update is isolated from the others, updates can occur out-of-order, error recovery is not too bad. Just retry a few times in the background. (Note, I am personally an advocate that updates should be easy—maybe I am biased.)

The cost is "wasting" disk space, and retrieving data is (relatively) more work. CouchDB is slow and wasteful like lorries are slow and wasteful. In reality, lorries are common in wealthy places and uncommon in poor places because they are a better long-term deal. Emotionally, we see lorries lumber about and vomit black smoke, but rationally, we know they are more efficient.

Most stats can be direct map/reduce views. However, you can also maintain "summary" documents with aggregated or independent results, or whatever else you need. Frequent updates are not a problem (on this scale: 86,400 updates per day is still just 1/sec). But you might want a dedicated "updater" client for those documents. With only one client working updating the special documents, you won't get 409 Conflicts since nobody else is fighting to update the same document.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top