Question

I need to create a database for dealing with click stream (from ~240 subdomains). I use a Java Script for grabbing information like (Host, Page, Date, userID, Referer, HostName, RefererPath, uniqueUserID) for each click and than insert the data to the database through a java web dynamic application. There are about 9 milion new records each day and I have to insert new records every minute. Another application needs to be able to retrieve information about pageviews/unique visitors/ect for a certain article/subdomain in the last (10min, 20min, 30min, 1hour...24 hours). I only need to keep records for the last 3 months.

Initially I thought about using MySQL as I'm only interested in open-source. But I'm thinking about NoSQL solutions. The problem is that I've had experience only with relational databases and am not really able to tell if NoSQL would be a better solution here or not. Also which database should I use if I choose to go wiht NoSQL? and would Key-value store be the best way to go?

Was it helpful?

Solution

I'm guessing this data consistency isn't critical (statistics ?) so you could indeed spare a bit of consistency. NoSQL seems a good choice and a key value store would also be my pick. Now the real question is : what is the most suited one ?

I'd give a consideration to Redis and Riak (which are basically the most well-known ones) :

Riak (AP system) :

  • Fault-tolerant (masterless with partitioning and replication)
  • Map reduce
  • Full text search
  • BASE

Redis (CP system) :

  • Really fast
  • In-memory : You need RAM ! That also means you want replication so you don't lose everything on a crash. Redis also uses disk snapshot I believe.
  • Master/Slave with reelection
  • BASE

Both have a lot more features, you should go read the documentation for gotchas. Redis is primarly used as a cache since it's fast, whereas Riak focuses on fault-tolerance. Given your scalability requirements, both can satisfy your need. Therefore you must chose according to what's above.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top