Question

What I'm looking for is to return some estimate of row count, instead of the actual count which can be an expensive call. Similar to what you see in google search (... of about 1.000 rows).

Are there some out-of-the-box solutions for this? If not, what's the general approach?

I'm querying Sql Server 2008 database.

EDIT: To clarify, the result count relates to certain user queries. For example, user searches for "John" and the result should be "There are about 1.280.000 rows that match John"

Was it helpful?

Solution

It's hard to tell what you're asking. If you're talking about returning a number from a search algorithm, you could compute a hash from the inputs, and then use that hash to map against a count that you periodically maintain every so often. That might give you "about" the right results, depending on how good the hash is and how often you update your counts.

OTHER TIPS

Just to add a wild card to the existing suggestions...

If your statistics are pretty up to date, one potential idea would be to analyse the estimated execution plan from your calling code (so the limitation here is this involves code outside SQL to receive & analyse the XML)

e.g.

SET SHOWPLAN_XML ON;
SELECT Something
FROM MyTable
WHERE SomeField = 'ABC123'

Then check the returned XML to pull out the 'EstimateRows' value.

Please see my comment above. However, if you are finding that the count operation is particularly expensive there does appear to be a way to approximate the number of rows using the following:

SELECT rows FROM sysindexes WHERE id = OBJECT_ID('sometable') AND indid < 2

This was taken from an earlier post located here:

Is count(*) really expensive?

The general approach would be to take a random sample of rows to estimate how many there really are. For example if your ids were UUIDs then you could perform a filter in your select statement which will create a random sample. So you could just look at rows with an id starting with "f". Then multiply the count by 16 to get the estimation for the row count. You would need to create an index for this to be fast though.

Separate to my other answer as this is a completely different answer that you can just use from within TSQL....

Another possibility would be to use the TABLESAMPLE clause to only look at a specified number (or percentage) of data pages, and then multiply that up.

e.g.

SELECT COUNT(*)
FROM MyTable TABLESAMPLE(50 PERCENT)
WHERE SomeField = 'ABC123'

Tweaking the sample size would be needed. I recommend having a full read through the BOL reference on it as it can be a very useful.

Vlejkoz, based on your further updates it appears that you are looking for a general text search algorithm rather than what I would guess is your current expensive table lookups and joins.

In SQL Server you have a full framework for exactly this, it's called Microsoft Full Text Search and provides you with additional querying capabilities. This provides you with search syntax far more like a traditional fuzzy style Google search but taylored torwards your specific database tables.

There's a lot to the topic so best that you take a look at this introductory article which seems to meet a similiar requirement to your question:

Microsoft Full Text Search article

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top