Question

I manage in my database a list of tools with this schema:

[id] int PRIMARY
[name] varchar

Every few seconds each tool emits a measurement. I will save it in an OLAP store with this schema:

[toolID] int
[time] timestamp
[measurement] int

(We have not chosen the OLAP store yet but assume we need one due to data amounts, semantics, and types of queries we will run)

How do I query the list of tool names with measurements greater than 100? The challenge is that I need to join data from both OLAP and OLTP stores.

Option 1 - save in OLAP also the tool name with each measurement (denormalization). The problem is the tool name might have changed since the measurement and I need the latest. Also there may be many more details (and detail data) per tool, not sure if it make sense to save it all per measurement.

Option 2 - OLAP returns just list of IDs, then I issue a query to OLTP to get names. This would require SQL queries with many embedded IDs and seems not right.

Option 3 - Synchronize all OLTP data into OLAP every few minutes. But OLAP tools are not optimized for updates (e.g. Vertica) so this does not seem efficient.

Était-ce utile?

La solution

Generally, in OLAP/DW systems, option 3 is preferred and the list of tools and their details would be stored in a Tool dimension table and the measurements would be stored in a Measurement fact table.

If, as you mentioned in your comment, you're not concerned with saving the history of a tool's details when the details change and the frequency as well as the number of updates to the tool details is small, then I would just update the records in the Tool dimension since it will be a relatively small number of updates.

If the frequency of updates is small but the actual number of updates is large then it may be easier and faster to simply truncate the Tool dimension and insert all Tool records from the OLTP system. In this case, you would need to ensure that there is a way to preserve the dimension keys in order to join back to the fact measurements that have already been stored. This could be difficult if you are using a surrogate key based on an auto-generated sequence.

The real problem arises when the frequency and number of updates to the tool details is large. In this case, you would have to step back and look at the overall model and determine if the tool details actually belong in a dimension or if they deserve their own fact table.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top