Question

I have an SQL Azure database, and one of the tables contains over 400k objects. One of the columns in this table is a count of the number of times that the object has been downloaded.

I have several queries that include this particular column (call it timesdownloaded), sorted descending, in order to find the results.

Here's an example query in LINQ to SQL (I'm writing all this in C# .NET):

var query = from t in db.tablename
    where t.textcolumn.StartsWith(searchfield)
    orderby t.timesdownloaded descending
    select t.textcolumn;

// grab the first 5
var items = query.Take(5);

This query called perhaps 90 times per minute on average.

Objects are downloaded perhaps 10 times per minute on average, so this timesdownloaded column is updated that frequently.

As you can imagine, any index involving the timesdownloaded column gets over 30% fragmented in a matter of hours. I have implemented an index maintenance plan that checks and rebuilds these indexes when necessary every few hours. This helps, but of course adds spikes in query response times whenever the indexes are rebuilt which I would like to avoid or minimize.

I have tried a variety of indexing schemes.

The best performing indexes are covering indexes that include both the textcolumn and timesdownloaded columns. When these indexes are rebuilt, the queries are amazingly quick of course.

However, these indexes fragment badly and I end up with pretty frequent delay spikes due to rebuilding indexes and other factors that I don't understand.

I have also tried simply not indexing the timesdownloaded column. This seems to perform more consistently overall, though slower of course. And when I check on the SQL query execution plan, it seems to be pretty inconsistent in how SQL tries to optimize this query. Of course it ends up with a log of logical reads as it has to fetch the timesdownloaded column from the table and not an organized index. So this isn't optimal.

What I'm trying to figure out is if I am fundamentally missing something in how I have configured or manage this database.

I'm no SQL expert, and I've yet to find a good answer for how to do this.

I've seen some suggestions that Stored Procedures could help, but I don't understand why and haven't tried to get those going with LINQ just yet.

As commented below, I have considered caching but haven't taken that step yet either.

For some context, this query is a part of a search suggestion feature. So it is called frequently with many different search terms.

Any suggestions would be appreciated!

Was it helpful?

Solution

Based on the comments to my question and further testing, I ended up using an Azure Table to cache my results. This is working really well and I get a lot of hits off of my cache and many fewer SQL queries. The overall performance of my API is much better now.

I did try using Azure In Role Caching, but that method doesn't appear to work well for my needs. It ended up using too much memory (no matter how I configured it, which I don't understand), swapping to disk like crazy and brought my little Small instances to their knees. I don't want to pay more at the moment, so Tables it is.

Thanks for the suggestions!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top