Question

Suppose I have a database table full of cars. There is a new row for each new version of a car, but they would share the same value in carid, like so:

carid=111, engine='4 cylinder', version=1
carid=111, engine='8 cylinder', version=2
carid=222, engine='8 cylinder', version=1

I want to apply a WHERE clause that would search through all versions of each car, but I only want to display the row for the newest version. For example, if I wanted to know which cars have ever had a 4 cylinder engine, I would want the query to return:

carid=111, engine='8 cylinder', version=2

My SQL might look like this:

SELECT a.carid, a.engine, a.version 
FROM cars a 
WHERE a.engine='6 cylinder' 
AND a.carid NOT IN 
    (SELECT b.id, b.engine, b.version 
    FROM sometable b 
    WHERE b.engine = '6 cylinder' 
    AND a.version2<b.version2)

My problem is that this requires the SQL columns and where criteria to be stated twice.

Suppose I did not have the DB privileges to create a view, and my query had lots of formatting of values in each column and a complicated where clause that applied to every column. The approach of the above SQL would be hideous. Is there an approach that doesn't require columns or where clauses to be defined twice?

To clarify, suppose instead of writing a.engine, I had to write CASE WHEN a.engine IS NOT NULL THEN someothertable.value ELSE FUNCTION(5,"cactus") END AS engine and I had 10 other columns each with long definitions. How would I avoid writing this mess twice?

I am using an oracle DB.

Thanks,
Tristan

Was it helpful?

Solution

You can do it like this:

SELECT a.carid, a.engine, a.version 
FROM cars a 
WHERE a.engine='6 cylinder' and
      not exists (select 1
                  from cars c2
                  where c2.carid = a.carid and
                        c2.engine = a.engine and
                        c2.version > a.version
                );

This assume you are trying to get the highest version for a given engine for a given carid.

Actually, you can do the same thing with the not in version. The not exists often performs better. If you care about performance, be sure you have an index on cars(engine, carid, version).

EDIT:

If you want the cars that have ever had a particular engine (and the most recent version, then do:

SELECT a.carid, a.engine, a.version 
FROM cars a 
WHERE not exists (select 1
                  from cars c2
                  where c2.carid = a.carid and
                        c2.version > a.version
                ) and
       exists (select 1 from cars c2 where c2.carid = a.carid and c2.engine = '6 cylinder');

Another way of doing this might make sense in this case:

select a.*
from cars a join
     (select carid, max(version) as maxversion
      from cars
      group by carid
      having sum(c2.engine = '6 cylinder') > 0
     ) cc
     on a.carid = cc.carid and a.version = cc.maxversion;

EDIT II:

If you wanted to do this with analytic functions:

select c.*
from (select c.*,
             max(version) over (partition by carid) as maxversion,
             sum(case when c.engine = '6 cylinder' then 1 else 0 end) over (partition by carid) as Num6Cylinder
      from cars c
     ) c
where version = maxversion and Num6Cylinder > 0;

I'm not sure this is a great simplification.

OTHER TIPS

SELECT carid, engine, version FROM
(
  SELECT  row_number() over(partition by carid order by version desc) AS r, 
  carid, engine, version
  FROM cars
  WHERE carid IN
  (
     SELECT DISTINCT carid 
     FROM cars
     WHERE engine = '4 cylinder'
  )
) tab
WHERE r = 1

check the working of the query at SQLFiddle: http://www.sqlfiddle.com/#!4/18e7b/42

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top