What is the best way to perform Aggregation / Denormalisation in a database?

https://stackoverflow.com/questions/12509914

03-07-2021
|

Pergunta

We have a database that stores clicks, views and goals reached. As you can guess the clicks in the database are hitting millions so we started aggregating the data for faster statistics. At this moment we delete all records and write new ones in a aggregation table (you can guess correctly that your mysql keys are going up rapidly) but this is the most easy way to aggregate our statistics without any errors.

I've searched the internet on database aggregation. Like how to store/denormalize your data so you can select the correct data quickly without having to reverence multiple tables. But I found no answer on this.

I'm also guessing that mysql is not the right database to store aggregated data in because the aggregated data will grow fast and could be selected from the begin to the end (like selecting the statistics for a whole year). Is my conclusion right? If so what database would you recommend me?

I also though of splitting the data over multiple tables for multiple solutions but I'm unsure if this would be smart.

My questions are:

what aggregation techniques are you using to keep your aggregation table correct?
What database would be the best for storing aggregation in?
Should I split aggregation over multiple tables? Or should I make one general table that can handle multiple types of data requests?
How do you handle speed issues?
Is there a different name for data aggregation?

I'm sorry for the lengthy question :). I've searched SO and the internet and have not found any good answer on these questions.

Solução

What database would be the best for storing aggregation in?

If I understand your definition of "aggreation" I would say that you are removing all relational stuff, so you are probably aiming for some noSQL sollution.

Should I split aggregation over multiple tables? Or should I make one general table that can >handle multiple types of data requests?

Impossible to say, depends on what you want. What you are doing is de-normalizing so you can get your data quicker. But if you denormalize too much, you can't find the right data. So it really is different for each situation.

How do you handle speed issues?

Again, impossible to say. Roughly: find out what causes them, and fix the problem.

Is there a different name for data aggregation?

It look like you are building something of a "Data Warehouse"? See random internets ( http://en.wikipedia.org/wiki/Data_warehouse for instance) for more on that.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow