Comparison Group by VS Over Partition By

https://stackoverflow.com/questions/9328238

27-10-2019
|

Question

Assuming one table CAR with two columns CAR_ID (int) and VERSION (int).

I want to retrieve the maximum version of each car.

So there are two solutions (at least) :

select car_id, max(version) as max_version 
  from car  
 group by car_id;

Or :

select car_id, max_version 
  from  ( select car_id, version
               , max(version) over (partition by car_id) as max_version
            from car
                ) max_ver  
 where max_ver.version = max_ver.max_version

Are these two queries similarly performant?

Solution

Yes It may affects

Second query is an example of Inline View. It's a very useful method for performing reports with various types of counts or use of any aggregate functions with it.

Oracle executes the subquery and then uses the resulting rows as a view in the FROM clause.

As we consider about performance , always recommend inline view instead of choosing another subquery type.

And one more thing second query will give all max records,while first one will give you only one max record.

see here

OTHER TIPS

I know this is extremely old but thought it should be pointed out.

select car_id, max_version 
  from (select car_id
             , version
             , max(version) over (partition by car_id) as max_version
          from car ) max_ver  
 where max_ver.version = max_ver.max_version

Not sure why you did option two like that... in this case the sub select should be theoretically slower because your selecting from the same table 2x and then joining the results back to itself.

Just remove version from your inline view and they are the same thing.

select car_id, max(version) over (partition by car_id) as max_version
  from car

The performance really depends on the optimizer in this situation, but yes the as original answer suggests inline views as they do narrow results. Though this is not a good example being its the same table with no filters in the selections given.

Partitioning is also helpful when you are selecting a lot of columns but need different aggregations that fit the result set. Otherwise you are forced to group by every other column.

It will depend on your indexing scheme and the amount of data in the table. The optimizer will likely make different decisions based on the data that's actually inside the table.

I have found, at least in SQL Server (I know you asked about Oracle) that the optimizer is more likely to perform a full scan with the PARTITION BY query vs the GROUP BY query. But that's only in cases where you have an index which contains CAR_ID and VERSION (DESC) in it.

The moral of the story is that I would test thoroughly to choose the right one. For small tables, it doesn't matter. For really, really big data sets, neither may be fast...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow