SQL view performance

Question 1

Usually, a view is treated in much the same way as a macro might be in other languages - the body of the view is "expanded out" into the query it's a part of, before the query is optimized. So your concern about it first computing all 500,000 results first is unfounded.

The exception to the above is if the view is e.g. an indexed view (SQL Server, query has to use appropriate hints or you have to be using a high-level edition) or a materialized view (Oracle, not sure on the requirements) where the view isn't expanded out - but the results have already been computed beforehand and are being stored much like a real table's rows are - so again, there shouldn't be too much concern whilst actually querying.

Question 2

When not having a materialized view, the SQL behind your view will always executed when using the view e.g. inside the FROM part. Of course, maybe some caching is possible, but this is depending on your DBMS and your configurations.

To see what the database is doing in background your might like to start with using EXPLAIN ANALYZE <your query>.

Question 3

Performance of queries on large datasets typically need clever application of indices. In your case a simple index on colD probably will do the trick. Depending on the data different types of indeces might need scrutiny. Hash tables, btrees etc all behave differently depending on the data. So there is no one solution that rules them all here. Otherwise optimization is better left to the query optimizer in your RDBMS. The developers there spend quite some time optimizing and critical segments probably are in low-level fast moving code. On another node clever cleaning of the data might be considered as well. And if aggregation is required datawarehousing with clever dimensions and pre aggregated values. Storage is cheap these days, computing time maybe not so.