Data Model - Best way to store large amount of information

https://stackoverflow.com/questions/23250076

database-design

08-07-2023
|

Question

I have a server application that retrieves information from some slave devices, the data retrieved amounts to about 200 floats per second per device. The application needs to be able to make reports in which the timeframe can vary from minutes to months. Because of that I implemented some kind of data warehousing.

The application works well, but now that the client is off my back I want to improve the application, this is why I want to ask if the Data Model I'm using is good or if it would be better to use another one, so here is what I'm using:

Let's suppose I have devices 1 & 2. I create the following tables: - data_s_1 & data_s_2 : in which I store the data as it enters. - data_m_1 & data_m_2 : in which I average the data for the last 60 seconds. - data_h_1 & data_h_2 : in which I average the data for the last 60 minutes.

This Data Model allows me to perform quick searches for the reports I'm asked, I have indexes stored in the upper tables that allow me to quickly find data in the lower ones. Example: the entry in data_h_1 has the indexes of the first and last item from the data_m_1 table used for the average, so if I need to get that data I use a search by index and is much faster.

What I want to ask is if it's better to make unique tables for all the devices (the client has more than 20) like a lone data_s table with a device_id field. It makes it easier when documenting but I don't know if there is a right way to do this. Any advice would be greatly appreciated.

Solution

In general it is better to have one table for each type of data (I assume that the structure of each of these data_s_x tables is the same). This makes it easy to add a new device without changing the structure of the database.

It does increase the data volumes in the one table, though, and makes it important that the right indexes have been applied - it probably is in your case any way!

Your data_m and data_h tables are strictly speaking denormalised: they contain derived data and are not actually needed as their contents can be calculated when required. However, this sort of denormalisation for performance reasons doesn't sound unreasonable. I don't know whether you added them because performance was bad without them: if you didn't then I'd suggest one possible improvement would be to see whether the reports which use than still run acceptably if you perform the AVG calculations in the SQL that generates the report. If performance is OK, then you can remove the tables and the processing that maintains them.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow