Question

I am using views for query convenience. The view is a join between three tables, using INNER JOIN and OUTER RIGHT joins. The overall result set from the view could be 500,000 records. I then perform other queries off of this view, similar to:

SELECT colA, colB, colC FROM vwMyView WHERE colD = 'ABC'

This query might return only 30 or so results. How will this be for performance? Internally in the SQL engine will the view always be executed, then the WHERE clause applied after, or is SQL Server smart enough to apply the WHERE clause first so that the JOIN operations are only done on a subset of records?

If I'm only returning 30 records to the middle tier, do I need to worry too much that the SQL Server had to trawl through 500,000 records to get to those 30 records? I have indexes applied on all important columns on the base tables.

Using MS SQL Server, view is not materialized

Was it helpful?

Solution

Usually, a view is treated in much the same way as a macro might be in other languages - the body of the view is "expanded out" into the query it's a part of, before the query is optimized. So your concern about it first computing all 500,000 results first is unfounded.

The exception to the above is if the view is e.g. an indexed view (SQL Server, query has to use appropriate hints or you have to be using a high-level edition) or a materialized view (Oracle, not sure on the requirements) where the view isn't expanded out - but the results have already been computed beforehand and are being stored much like a real table's rows are - so again, there shouldn't be too much concern whilst actually querying.

OTHER TIPS

When not having a materialized view, the SQL behind your view will always executed when using the view e.g. inside the FROM part. Of course, maybe some caching is possible, but this is depending on your DBMS and your configurations.

To see what the database is doing in background your might like to start with using EXPLAIN ANALYZE <your query>.

Performance of queries on large datasets typically need clever application of indices. In your case a simple index on colD probably will do the trick. Depending on the data different types of indeces might need scrutiny. Hash tables, btrees etc all behave differently depending on the data. So there is no one solution that rules them all here. Otherwise optimization is better left to the query optimizer in your RDBMS. The developers there spend quite some time optimizing and critical segments probably are in low-level fast moving code. On another node clever cleaning of the data might be considered as well. And if aggregation is required datawarehousing with clever dimensions and pre aggregated values. Storage is cheap these days, computing time maybe not so.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top