Question

This is a question about best practice really.
I am developing a system, where I will be collecting some measurements (call them HR and RR) and calculating average values of these measurements. Now for the user interface we are only intrested in these average values, but for the in-depth data analysis later on, we need all individual measurements (to export to matlab) as well as all the average calculations (don't ask - user requirement, I would just save individual measurements and calculate average later if it is needed).

Here are the details about average calculations etc:
- HR: we get readings every 500 - 1500ms (variable). We calculate the average based on 4-12 readings (depending on time between the readings).
- RR: we get readings every 3-17sec (variable). We calculate average based on 2-3 readings (depending on time between the readings).

For both we save:
- Average value (decimal) together with the timestamp of first reading from the readings used for the average calculation.
- Each individual reading (decimal) together with timestamps of when the reading was taken.

As you can see the data is the same for average calculations and individual readings. The same with HR/RR - the data is the same and could be represented as:
- - - - - - - - - -
| Reading      |
- - - - - - - - - -
| Timestamp |
| Value          |
- - - - - - - - - -

Since we compute data at different time intervals etc, we cannot store HR+RR as a single row in the database, we need separate rows or tables.
The questions are:
1. Is it better practice to create seperate tables for HR and RR? Or is it better to store them in the same table as seperate rows, with a column indicating if a given row is HR or RR?
2. Is it better to create seperate tables for each individual readings? Or is it better to create self-referencing table, where each individual reading would reference a row in the same table, with the average calculation it was used in?

I am not that great with DB design and I am not sure what are the best practices used in that situation.
I was also considering using MongoDB (rather than SQL database - probably MSSQL since the project is C# based), that would probably make life easier since I could have an array of individual measurements embeded in a document with average calculation etc. As far as I know writes to Mongo are very fast...

Any pointers? Thanks.

Was it helpful?

Solution

As wishy washy as it sounds, it depends. To your first question, one could very legitimately look at this as a table of readings or as two more specific tables. That said, years ago I would’ve said a single table, but over the years have gravitated toward the two tables. For one, your key values become more specific--(Reading) vs. (Reading + Type). And otherwise you’ll find yourself adding “AND ReadType =…” in your sleep. It also leaves you more flexibility when someone decides one reading needs to be to a different precision or also store the color of shirt the technician was wearing.

On the second question, again, opinions will vary but I’d lean toward a parent table of reading sets and a detail of individual readings. The self-referencing table feels like it wins some style points but joining back to oneself can get tricky depending on the answers you’re trying to get. Also, your final DB platform choice may or may not include some of the specialized options like MSSQL’s CTEs that address some of these complexities.

Overall, you could probably have:

  • ReadingSet (ReadingSetID [, other info as needed])
  • ReadingR (ReadingRID,ReadingSetID, Value, TimeStamp)
  • ReadingH (ReadingHID, ReadingSetID, Value, TimeStamp)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top