Question

I'm looking into using Table Storage for storing some transactional data, however, I need to support some very high level reporting over it, basically totals per day / month.

Couple of options I have though of:

  • Use a partition / row key structure and dynamically perform sum
    e.g. 20101101_ITEMID_XXXXXXXX (x = guid or time, to make unique) then I would query for a months data using a portion of the row key (ITEMID_201011), and to a total on the "Cost" property in the type.

    How would the query limit of 1000 records be managed by this though? (i.e. if there are more than 1000 transactions for the day, totaling would be hard)

  • Use another record to store the total for the day, and update this as new records are added
    e.g. row key "20101101_ITEMID_TOTAL" then query off this for the days totals, or months, or years totals.

What is the best way to do this? Is there a 'best practice' for this type of requirement using table storage?

Was it helpful?

Solution

I'm not sure what is the best practice but I can comment that we have a similar situation with AzureWatch and are definitely using pre-aggregated values in tables.

Mostly for performance reasons -- table storage is not instantaneous even if you query by single partition-key and a range in row-key. The time it takes to download the records is somewhat significant and depending on the records might spike the CPU up, because it needs to de-serialize the data into objects. If you get to travel to the table storage multiple times because of the 1000 record limit, you'll be paying more as well.

Some other thoughts to consider:

Will your aggregated totals ever change? If no, the this is another nudge toward pre-aggregation

Will you need to keep aggregated values after raw data is gone or will you ever need to purge raw data? If yes, then it is another nudge toward pre-aggregation

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top