Question

I was working with a query I wrote today had to change the code from the WHERE clause to use a IN(list of stuff) filter instead of using something like

item_desc = 'item 1'
OR item_desc = 'item 2'
OR item_desc = 'item 3'
OR item_desc = 'item 4'

The above ran for 15 minutes and returned nothing, yet the following gave me my result set in 1.5 minutes

item_desc IN (
'item 1'
,'item 2'
,'item 3'
,'item 4'
)

I did this in SQL and am wondering why the IN(list of items) performed so much faster then the OR statement.

-- EDIT -- SQL Server 2008, I do apologize for not putting this bit of info in in the first place.

Here is the Query in its entirety using the OR statements:

DECLARE @SD DATETIME
DECLARE @ED DATETIME
SET @SD = '2013-06-01';
SET @ED = '2013-06-15';

-- COLUMN SELECTION
SELECT PV.PtNo_Num AS 'VISIT ID'
, PV.Med_Rec_No AS 'MRN'
, PV.vst_start_dtime AS 'ADMIT'
, PV.vst_end_dtime AS 'DISC'
, PV.Days_Stay AS 'LOS'
, PV.pt_type AS 'PT TYPE'
, PV.hosp_svc AS 'HOSP SVC'
, SO.ord_no AS 'ORDER NUMBER'
--, SO.ent_dtime AS 'ORDER ENTRY TIME'
--, DATEDIFF(HOUR,PV.vst_start_dtime,SO.ent_dtime) AS 'ADM TO ENTRY HOURS'
, SO.svc_desc AS 'ORDER DESCRIPTION'
, OSM.ord_sts AS 'ORDER STATUS'
, SOS.prcs_dtime AS 'ORDER STATUS TIME'
, DATEDIFF(DAY,PV.vst_start_dtime,SOS.prcs_dtime) AS 'ADM TO ORD STS IN DAYS'

-- DB(S) USED
FROM smsdss.BMH_PLM_PtAcct_V PV
JOIN smsmir.sr_ord SO
ON PV.PtNo_Num = SO.episode_no
JOIN smsmir.sr_ord_sts_hist SOS
ON SO.ord_no = SOS.ord_no
JOIN smsmir.ord_sts_modf_mstr OSM
ON SOS.hist_sts = OSM.ord_sts_modf_cd

-- FILTER(S)
WHERE PV.Adm_Date BETWEEN @SD AND @ED
AND SO.svc_cd = 'PCO_REMFOLEY'
OR SO.svc_cd = 'PCO_INSRTFOLEY'
OR SO.svc_cd = 'PCO_INSTFOLEY'
OR SO.svc_cd = 'PCO_URIMETER'

AND SO.ord_no NOT IN (
    SELECT SO.ord_no
    FRROM smsdss.BMH_PLM_PtAcct_V PV
    JOIN smsmir.sr_ord SO
    ON PV.PtNo_Num = SO.episode_no
    JOIN smsmir.sr_ord_sts_hist SOS
    ON SO.ord_no = SOS.ord_no
    JOIN smsmir.ord_sts_modf_mstr OSM
    ON SOS.hist_sts = OSM.ord_sts_modf_cd
    WHERE OSM.ord_sts = 'DISCONTINUE'
    AND SO.svc_cd = 'PCO_REMFOLEY'
    OR SO.svc_cd = 'PCO_INSRTFOLEY'
    OR SO.svc_cd = 'PCO_INSTFOLEY'
    OR SO.svc_cd = 'PCO_URIMETER'
)
ORDER BY PV.PtNo_Num, SO.ord_no, SOS.prcs_dtime

Thank you,

Was it helpful?

Solution

Oleski's answer is incorrect. For SQL Server 2008, an IN list gets refactored to a series of OR statements. It may be different in say MySQL.

I'm fairly certain that if you generated actual execution plans for both your queries they would be identical.

In all likelihood the second query ran faster because you ran it second, and the first query had already pulled all the data pages from the database and paid the IO cost. The second query was able to read all the data from memory and execute a lot faster.

Update

The actual source of the variance is likely that the queries are not equivalent. You have two different OR lists below:

WHERE PV.Adm_Date BETWEEN @SD AND @ED
AND SO.svc_cd = 'PCO_REMFOLEY'
OR SO.svc_cd = 'PCO_INSRTFOLEY'
OR SO.svc_cd = 'PCO_INSTFOLEY'
OR SO.svc_cd = 'PCO_URIMETER'

and later

 WHERE OSM.ord_sts = 'DISCONTINUE'
    AND SO.svc_cd = 'PCO_REMFOLEY'
    OR SO.svc_cd = 'PCO_INSRTFOLEY'
    OR SO.svc_cd = 'PCO_INSTFOLEY'
    OR SO.svc_cd = 'PCO_URIMETER'

In both those WHERE clauses, operator precendence (where AND is handled before OR) means that the actual logic run by the engine is:

WHERE (ConditionA AND ConditionB)
OR ConditionC
OR ConditionD
OR ConditionE

If you replace the OR lists with an IN expression, the logic will be:

WHERE ConditionA
AND (ConditionB OR ConditionC OR ConditionD OR ConditionE)

Which is radically different.

OTHER TIPS

The best way to tell is to look at the actual query plan using something like EXPLAIN. This should tell you exactly what the DBMS is doing, and then you can get a much better idea why it's more efficient.

With that said, DBMS systems are really good at doing operations between two tables (like joins). A lot of the optimizer's time is spent on these parts of the queries because they are generally more expensive.

For example, the DBMS could sort that IN list and, using an index on item_desc, filter the results very quickly. You can't do that optimization when you list a bunch of selections like in the first example.

When you use IN, you are making an impromptu table and filtering using these more efficient table combining techniques.

EDIT: I posted this answer before OP mentioned the specific DBMS. This is turns out to NOT be how SQL Server treats this query, but might be valid for other DBMS systems. See JNK's answer for a more specific, accurate answer.

Licensed under: CC-BY-SA with attribution
scroll top