Question

Is it a correct approach to store needed Azure Storage Table relations in Azure SQL database? For example, I may have a system that keeps track of Users and Books that they own. I would keep Users entities and Books entities in Azure Storage Table and the relations (idUser, idBook) in supporting Azure SQL table.

Is it a good approach? What will be the drawbacks of this solution?

EDIT: The motivation to do it is simply to lower cost. I'll need to store a lot of data, so I plan to use Azure Storage Tables, becuase SQL database will be simply to expensive. But in some scenarios I'll need to store relations between objects.

Was it helpful?

Solution 3

Advantages

  • Getting the relationships in SQL Azure would be faster than getting the relationships from table storage

Disadvantages

  • As @Ic. stated; you have no easy way of maintaining referential integrity
  • Slow performance due to having to pull the relations into memory from SQL Azure; then enumerate through them to get the correct table storage entries
  • Table storage itself is dramatically slower than SQL Azure (See this question)
  • There is still a cost to maintaining an SQL Azure database; even if it is a small one

I have heard of users using Azure Table Storage to store relationships as well; for example:

  • Table1: Users (PartitionKey: UserID)
  • Table2: Books (PartitionKey: BookID)
  • Table3: UserBooks (ParititonKey: UserID, RowKey: BookID)
  • Table4: BooksUsers (PartitionKey: BookID, RowKey: UserID)

UserBooks and BookUsers act like an explicitly defined index; and would allow you to perform faster searches as the PartitionKey and RowKey are the fields you will be using for the association.

However the clear disadvantage is having to maintain 2 extra tables alongside your data.

Really it boils down to whether the performance drop (And it will be a severe drop) from using Table Storage instead of SQL Azure is worth the costs saved.

OTHER TIPS

I might be missing something, but frankly I can't see a single good reason to do this.

One major reason one would use a relational database is to store relational data, maintain referential integrity, and rely on the query optimizer to make efficient joins. However since you are not storing the related user or book data in the same database, you cannot create a foreign key constraint on either, nor join data across tables since it does not exist. In fact it's worse because first you have to fetch data from the SQL database, then you have to go out to table storage to get the rest of the data, so you would be making a connection to two different services just to retrieve one list of data.

I would like to add a few things to @Click-Rex's answer:

  1. As David mentioned Table Storage is slower if you query it improperly i.e. your queries are doing full table scan. So if you design your Partitions really well, you should get better performance than SQL Azure.
  2. Table storage is dirt cheap compared to SQL Azure.
  3. Do realize that SQL Azure is kind of high density SQL server hosting thus you may be impacted by noisy neighbor behavior. Table storage is somewhat safe from that as the isolation boundary is first your storage account, then the table and then PartitionKey.

The approach presented by @Click-Rex is the way to go though I would like to one more thing:

In your additional tables, duplicate the books and user information as well and not just BookId and UserId. That way you're reading from just one table instead of doing multiple reads. The downside of this approach is that you have to make sure that whenever book information or user information gets changed, you would need to update these tables as well but the plus side is that you would save a lot on read operations. For example, let's say you want to find out the books owned by a user. If you don't store the book information in this secondary table, first you would fetch all row keys (book ids) from this secondary table and then for each book id you would fetch information about the book from the books table. Assuming a user has 500 books, you are performing 500+1 read transactions. However if you store the book information in the secondary table itself, then you're just doing 1 read operation.

Obviously this approach would make sense if the application is performing more reads than writes. Another thing you would need to keep in mind is that you won't get transaction support as you would be writing across many tables and partitions so you would need to ensure that entities get persisted no matter what. In an application that I am building right now, we are following this approach and we actually have a worker role responsible for ensuring that the data gets persisted.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top