Question

I'm working on an web app that collects traffic information for websites that use my service. Think google analytics but far more visual. I'm using SQL Server 2012 for the backbone of my app and am considering using MongoDB as the data gathering analytic side of the site.

If I have 100 users with an average of 20,000 hits a month on their site, that's 2,000,000 records in a single collection that will be getting queried.

  • Should I use MongoDB to store this information (I'm new to it and new things are intimidating)?
  • Should I dynamically create new collections/tables for every new user?

Thanks!

Was it helpful?

Solution

With MongoDB the collection (aka sql table) can get quite big without much issue. That is largely what it is designed for. The Mongo is part HuMONGOus (pretty clever eh). This is a great use for mongodb which is great at storing point in time information.

Options :

1. New Collection for each Client

very easy to do I use a GetCollectionSafe Method for this

public class MongoStuff

    private static MongoDatabase GetDatabase()
    {
        var databaseName = "dbName";
        var connectionString = "connStr";
        var client = new MongoClient(connectionString);
        var server = client.GetServer();
        return server.GetDatabase(databaseName);
    }

    public static MongoCollection<T> GetCollection<T>(string collectionName)
    {
        return GetDatabase().GetCollection<T>(collectionName);
    }



    public static MongoCollection<T> GetCollectionSafe<T>(string collectionName)
    {
        //var db = GetDatabase();
        var db = GetDatabase();
        if (!db.CollectionExists(collectionName)) {
            db.CreateCollection(collectionName);
        }
        return db.GetCollection<T>(collectionName);
    }
}

then you can call with :

var collection =  MongoStuff.GetCollectionSafe<Record>("ClientName");

Running this script

static void Main(string[] args)
{
    var times = new List<long>();
    for (int i = 0; i < 1000; i++)
    {
        Stopwatch watch = new Stopwatch();
        watch.Start();
        MongoStuff.GetCollectionSafe<Person>(String.Format("Mark{0:000}", i));
        watch.Stop();
        Console.WriteLine(watch.ElapsedMilliseconds);
        times.Add(watch.ElapsedMilliseconds);
    }
    Console.WriteLine(String.Format("Max : {0} \nMin : {1} \nAvg : {2}", times.Max(f=>f), times.Min(f=> f), times.Average(f=> f)));

    Console.ReadKey();
}

Gave me (on my laptop)

Max : 180 
Min : 1 
Avg : 6.635

Benefits :

  • Ease of splitting data if one client needs to go on their own
  • Might match your brain map of the problem

Cons :

  • Almost impossible to do aggregate data over all collections
  • Hard to find collections in Management studios (like robomongo)

2. One Large Collection

Use one collection for it all access it this way

var coll =  MongoStuff.GetCollection<Record>("Records");    

Put an index on the table (the index will make reads orders of magnitude quicker)

coll.EnsureIndex(new IndexKeysBuilder().Ascending("ClientId"));

needs to only be run once (per collection, per index )

Benefits :

  • One Simple place to find data
  • Aggregate over all clients possible
  • More traditional Mongodb setup

Cons :

  • All Clients Data is intermingled
  • May not mentally map as well

Just as a reference the mongodb limits for sizes are here : [http://docs.mongodb.org/manual/reference/limits/][1]

3. Store only aggregated data

If you are never intending to break down to an individual record just save the aggregates themselves.

Page Loads : 
#    Page            Total Time      Average Time 
15   Default.html    1545            103

OTHER TIPS

I will let someone else tackle the MongoDB side of your question as I don't feel I'm the best person to comment on it, I would point out that MongoDB is a very different animal and you'll lose a lot of the RI you enjoy in SQL.

In terms of SQL design I would not use a different schema for each customer approach. Your database schema and backups could grow uncontrollably, maintaining a dynamically growing schema will be a nightmare.

I would suggest one of two approaches:

Either you can create a new database for each customer:

  • This is more secure as users cannot access each other's data (just use different credentials) and users are easier to manage/migrate and separate.
  • However many hosting providers charge per database, it will cost more to run and maintain and should you wish to compare data across users it gets much more challenging.

Your second approach is to simply host all users in a single DB, your tables will grow large (although 2 million rows is not over the top for a well maintained SQL DB). You would simply use a UserID column to discriminate.

  • The emphasis will be on you to get the performance you need through proper indexing
  • Users' data will exist in the same system and there's no SQL defense against users accessing each other's data - your code will have to be good!
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top