Question

From performance perspective , is this the best way to write the following query concerning the nested query :


SELECT a.meg,a.currency
FROM alt6sal a 
WHERE  a.meg_code IN (1,2)
AND a.sal_year = (SELECT MAX(ia.sal_year) FROM alt6sal ia WHERE a.emp_num = ia.emp_num )
AND a.sal_mon = (SELECT  MAX(ia.sal_mon) FROM alt6sal ia  WHERE a.emp_num = ia.emp_num AND a.sal_year = ia.sal_year)
Was it helpful?

Solution 4

If you can avoid correlated subquery, the better the performance, example of non-correlated subquery:

SELECT a.meg,a.currency
FROM alt6sal a 

join 
(
    select ia.emp_num, max(ia.sal_year) as sal_year_max
    from alt6sal ia
    group by ia.emp_num
) the_year_max
on a.emp_num =  the_year_max.emp_num and a.sal_year = the_year_max.sal_year_max

join 
(
    select ia.emp_num, ia.sal_year, max(ia.sal_mon) as sal_mon_max
    from alt6sal ia
    group by ia.emp_num, ia.sal_year
) the_month_max
on a.emp_num = the_month_max.emp_num and a.sal_year = the_month_max.sal_year
and a.sal_mon = the_month_max.sal_mon_max

WHERE  a.meg_code IN (1,2)

Analogous non-correlated JOINS for OR instead of AND, use LEFT JOIN then filter-in non-null

SELECT a.meg,a.currency
FROM alt6sal a 

left join 
(
    select ia.emp_num, max(ia.sal_year) as sal_year_max
    from alt6sal ia
    group by ia.emp_num
) the_year_max
on a.emp_num =  the_year_max.emp_num and a.sal_year = the_year_max.sal_year_max

left join 
(
    select ia.emp_num, ia.sal_year, max(ia.sal_mon) as sal_mon_max
    from alt6sal ia
    group by ia.emp_num, ia.sal_year
) the_month_max
on a.emp_num = the_month_max.emp_num and a.sal_year = the_month_max.sal_year
and a.sal_mon = the_month_max.sal_mon_max

WHERE  a.meg_code IN (1,2)
       and 
       (the_year_max.ia_emp_num is not null 
        or the_month_max.ia_emp_num is not null)

OTHER TIPS

You can try this -

SELECT meg, currency
FROM
(
SELECT a.meg,a.currency, 
dense_rank() over (PARTITION BY a.emp_num ORDER BY a.sal_year desc) year_rank,
dense_rank() over (PARTITION BY a.emp_num ORDER BY a.sal_mon desc) mon_rank
FROM alt6sal a 
WHERE  a.meg_code IN (1,2)
)
WHERE year_rank = 1
AND mon_rank = 1;

The performance of any suggestion here will depend a lot of :
- version of your Informix engine (syntax probably will not work with version <11.50)
- filtered Indexes
- amount of data
- table/indexes statistics updated

This will force the database create a temporary table first with all sal_year and then join with the main table...

Suggestion 1)

SELECT a.meg,a.currency
FROM alt6sal a
    ,(SELECT emp_num, MAX(ia.sal_year) sal_year FROM alt6sal ia group by 1 ) as a2
WHERE  a.meg_code IN (1,2)
AND a.sal_year = a2.sal_year and a.emp_num = a2.emp_num
AND a.sal_mon = (SELECT  MAX(ia.sal_mon) FROM alt6sal ia  WHERE a.emp_num = ia.emp_num AND a.sal_year = ia.sal_year)

Suggestion 2)

SELECT a.meg,a.currency
FROM alt6sal a
    ,(SELECT aa.emp_num, MAX(aa.sal_year) sal_year FROM alt6sal aa where aa.meg_code in (1,2) group by 1 ) as a2
    ,(SELECT ab.emp_num, ab.sal_year, max(ab.sal_mon) sal_mon  FROM alt6sal ab  where ab.meg_code in (1,2)group by 1,2 ) as a3
WHERE  a.meg_code IN (1,2)
AND a.sal_year = a2.sal_year and a.emp_num = a2.emp_num
and a.sal_mon  = a3.sal_mon AND a.sal_year = a2.sal_year and a.emp_num = a2.emp_num
;

in any case i wouldn't prefer Co-related Sub query. from the comments I see this is for INFORMIX and not for SQL and hence i would suggest to use the JOIN with Nested Select as first preference. Advantage of doing that is those are very native way of writing query and you can expect the DB optimizer to come up with good execution plan using indexes (when available). in SQL i would go for the CTE if my tables are not in millions of rows. i assume you have appropriate indexes on table. if not make sure you have following Indexes on Table.

notice the order of column in Index and their ASC/DESC order.

    CREATE CLUSTER INDEX IDXc_alt6sal 
    ON alt6sal (    meg_code ASC,
            sal_year DESC,
            sal_mon DESC,
            emp_num ASC
        )

    CREATE INDEX IDXnc_alt6sal 
    ON alt6sal (    meg_code ASC,
            sal_year DESC,
            sal_mon DESC,
            emp_num ASC
        ) INCLUDE (meg,currency)

now test below query. notice that i have added "meg_code IN ( 1, 2 )" condition in all select when ever I am using the actual table. that allows query to reduce the number of rows needs to be in result set even in the nested select statements. also notice that the column mentioned in the Query in Where and JOIN condition and match with the column order in Indexes.

one thing i left for you to try is

"meg_cod=1 OR meg_cod=2"

instead

"meg_code IN ( 1, 2 )"

and see if performance is noticeably improves. I know if it is SQL it will not make any difference but for INFORMIX I am not 100% sure.

    SELECT t1.meg,t1.currency,t1.emp_num
    FROM alt6sal  t1
    JOIN
    (
        Select yer.emp_num,yer.sal_year,MAX(mth.sal_mon) AS sal_mon
        FROM        
        ( SELECT   emp_num, MAX(sal_year) AS sal_year
           FROM     alt6sal 
           WHERE    meg_code IN ( 1, 2 )
           GROUP BY emp_num
        )yer
        JOIN alt6sal  mth
        ON yer.sal_year = mth.sal_year AND yer.emp_num=mth.emp_num
        AND mth.meg_code IN (1,2)
        GROUP BY yer.sal_year,yer.emp_num
    )t2
    ON t1.sal_year=t2.sal_year AND t1.sal_mon=t2.sal_mon AND t1.emp_num=t2.emp_num 
    AND t1.meg_code IN (1,2)

I would rather create a staging table to find the MAX values, and might reduce locks as you do 2 separate reads on the table and not 3 concurrent reads.

    /*create @table to keep uniqe records for empnum, salaryyear, salarymonth*/
    DECLARE @maxyearstage TABLE(empnum BIGINT, combo DATETIME);
    DECLARE @maxyear TABLE(empnum BIGINT, [year] INT, [month] TINYINT);
    INSERT INTO @maxyearstage 
    SELECT DISTINCT my.emp_num
    , CAST(CONVERT(VARCHAR(my.sal_year)+'-'+CONVERT(VARCHAR(my.sal_month)+'-'+'01' [combo]
    FROM alt6sal my;

    INSERT INTO @maxyear
    SELECT t3.empnum, YEAR(t3.combo), MONTH(t3.combo)
    FROM ( SELECT T2.empnum, MAX(T2.combo) combo FROM @maxyear T2 GROUP BY T2.empnum) t3;

    SELECT a.meg,a.currency
    FROM alt6sal a 
    INNER JOIN @maxyear t1 ON t1.empnum = a.empnum AND t1.[year] = a.sal_year AND t1.[month] = a.sal_mon
    WHERE a.meg_code IN (1,2)

Looks like the query is going to access the bulk of the data in the table (if not directly a full table scan). If so, I would recommend to avoid correlated-subqueries completly, as they at best would only perform as well as your indexes. Try to re-write it into a simple join like the one below, with the first half simply finding the max year/month for each employee and then uses that as a join filter against alt6sal.

SELECT a.meg,a.currency
FROM alt6sal a, 
     (SELECT MAX(ia.sal_year || '-' || ia.sal_mon) max_sal_year_mon, ia.emp_num ia_emp_num FROM alt6sal ia where ) ia
WHERE  a.meg_code IN (1,2)
AND (a.sal_year||'-'||a.sal_mon)  = max_sal_year_mon
AND ia_emp_num = emp_num;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top