Question

I am in the middle of an architectural decision that down the line will be important.

I have a system where I use ATS (Azure Table Storage) as the store for simple and very little data. It's not written to, too often and it's not read to, too often.

Now I need to read/write data based on these ATS data. A lot of data. But I will not read the data in ATS before I write the "other" data.

My concern is that I cannot read the data fast enough from ATS, based on the needs I have. E.g I need to count rows fairly fast to give the users feedback, and count is not a function within ATS. I could gain something from the CQRS "pattern" but I would still need to count rows! And I am not looking for a solution where I need to add complexity to overcome a very simple thing on another platform (SQL).

So my thought is to save these data inside a SQL database where I also have all the data manipulation-features I need. But I would miss out on the scalability and end up maintaining that instead of an easy scalable datastore like ATS. And the data I am writing here is very simple, but it's a lot.

I would like to stay with ATS only but I would need a pattern to get around these limitations. Any ideas ?

Thanks!

Was it helpful?

Solution

Is consistency important? If you consider saving data in SQL, why not go with SQL all the way?

Anyways... aggregate functions like count is not a feature of Table Storage, so you either need to fetch all rows and count "client side" (can be very slow), or figure out a way to cache the "count" in another entity. It really depends on your data and architecture, and sometimes this is not possible to select a good PartitionKey strategy.

You could also look at Lucene, Azure Search, Hadoop, which is optimized for reading. However, consistency is a problem.

We build a rather large architecture using table stores as a read/write/update solution, and to be honest, I will not recommend it. Table storage is great for write once scenario and solutions where performance is not an issue. But building a relational structure where performance is important is tuff. If you need to make queries where you receive more than 1000 rows you should make a design where you can query in parallel. And you should then expect 1+ second for queries, even if you query by partition key.

Table storage is really only fast if you make single entry lookup by PartitionKey and RowKey.

But yes, Table Storage scales well, even under load. But it comes with a price.

Licensed under: CC-BY-SA with attribution
scroll top