Question

We've got a windows azure table storage system going on where we have various entity types that report values during the day so we've got the following partition and row key scenario:

There are about 4000 - 5000 entities. There are 6 entity types and the types are roughly evenly distributed. so around 800'ish each.

ParitionKey: entityType-Date

Row key: entityId

Each row records the values for an entity for that particular day. This is currently JSON serialized.

The data is quite verbose.

We will periodically want to look back at the values in these partitions over a month or two months depending on what our website users want to look at.

We are having a problem in that if we want to query a month of data for one entity we find that we have to query 31 partition keys by entityId.

This is very slow initially but after the first call the result is cached.

Unfortunately the nature of the site is that there will be a varying number of different queries so it's unlikely the data will benefit much from caching.

We could obviously make the partitions bigger i.e. perhaps a whole week of data and expand the rowKeys to entityId and date.

What other options are open to me, or is simply the case that Windows Azure tables suffer fairly high latency?

Was it helpful?

Solution

Some options include

  1. Make the 31 queries in parallel

  2. Make a single query on a partition key range, that is

    Partition key >= entityType-StartDate and Partition key <= entityType-EndDate and Row key = entityId.

It is possible that depending on your data, this query may have less latency than your current query.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top