Question

So I'm planing to use mongodb (I'm new there) to track impressions and traffic in general for my porn website. I have on banner sometimes more than 1 million of impressions. And I have various banners as well... So potentially on daily basis maybe I will have 1 billion impressions on banners and I want to store that in database so I can see which banner is best converting in certain time frame, and what banner convert best in certain country etc.

Object in collection for example looks like this:

{ "_id" : ObjectId("5124d03d512c175714000000"), "bid" : ObjectId("5124a9ec512c178710000000"), "city" : "Rome", "country" : "Italy", "client_id" : "127.0.0.1", "referer" : "youporn.com", "user_agent" : "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0", "visit_datetime" : "2013-Feb-20 02:31:41", "visit_year" : "2013", "visit_month" : "Feb", "visit_day" : "20" }

So I need info and advice is this good way to store impressions, or my organisation should be totaly different (maybe separate collections for each country, but that again will be problematic at some point) ?

I really appreciate all the ideas, suggestions, questions and comments.

Was it helpful?

Solution

AS @Joachim Isaksson commented on your question above, the amount of data you generate is huge. Based on this you must decide if you have the capabilities to handle this amount or not.

If you do have them, I guess you will need some map & reduce approaches afterwards to get something out of the data (with the current design of the data).

The main issue I see here is, that you should have some specific questions that you would like to have answered. If you do, you could model the collection accordingly. Especially in what dimension you need the data. Otherwise you most probably just collect a bunch of data which you never use in the end or even worse, collect the wrong data.

If you are interested only in country and dates, why not just increment a counter on an entry in a banner / day / country combination? So instead of saving every time the whole date and country, create an entry like:

{ "bannerId" : "b1", "country" : "IT", "date" : "20130220", "count" : 0 }

And then just increment the count part of the object. This would save you a lot of data. If you need more detailed information (e.g. on hourly time periods), you can also save an entry per hour (date + hour).

Otherwise why not look at an existing data warehousing application, which does also provide all the tools to interpret the data? Would also be an option.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top