Optimal Data Storage for Web Analytics

https://stackoverflow.com/questions/9340887

30-04-2021
|

Domanda

I would like to write a custom web analytics and action tracking solution and use it as a feedback mechanism eg. for searches or content suggestion.

If it were only for short lived data, I would use some NOSQL engine with a limited data retention. But ideally I would like to keep a long history.

One nice aproach I have seen in the past, was to use mysql for storage, one table per month, with older tables being converted to MySQL ARCHIVE format. For browsing the archives and aggregate data, MySQL views were implemented.

My question: How does something like Google Analytics store its data? In a structured database or something else. Which way would you suggest to avoid a long-term memory hog, while keeping query possibilities flexible?

(I am not concerned about writing speeds into the DB, this will happen in async batches and not in realtime)

Soluzione

Google uses its own Big Table implementation to store its data. If you are interested in big data solutions and utilizing big data you should take a look at this. For an open source implementation built off of google's Big Table check out Hbase/Hadoop. I will post back some links in a minute.

The analytics themselves done on this type of data utilize map/reduce operations.

Altri suggerimenti

I think that Urchin was originally using it's own custom built multi-dimensional database, but I'm not sure if Google Analytics is still using it. Anyway, analytics systems often use Cube based schemas for fast OLAP browsing.

I know Microsoft and Oracle have this type of functionality in their database tools, but they're very expensive. I'm not aware of any open-source cube databases; however if you're using Java then I think Mondrian works in the same way, but using a relational database for storage.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow