Question

I have a multi-table SELECT query which compares column values with itself like below:

SELECT * FROM table1 t1,table2 t2
      WHERE t1.col1=t2.col1  --Different tables,So OK.
      AND t1.col1=t1.col1    --Same tables??
      AND t2.col1=t2.col1    --Same tables??

This seems redundant to me. My query is, Will removing them have any impact on logic/performance?

Thanks in advance.

Was it helpful?

Solution

This seems redundant, its only effect is removing lines that have NULL values in these columns. Make sure the columns are NOT NULL before removing those clauses.

If the columns are nullable you can safely replace these lines with (easier to read, easier to maintain):

  AND t1.col1 IS NOT NULL
  AND t2.col1 IS NOT NULL


Update following Jeffrey's comment

You're absolutely right, I don't know how I didn't see it myself: the join condition t1.col1=t2.col1 implies that only the rows with the join columns not null will be considered. The clauses tx.col1=tx.col1 are therefore completely redundant and can be safely removed.

OTHER TIPS

Don't remove them until you understand the impact. If, as others are pointing out, they have no effect on the query and are probably optimised out, there's no harm in leaving them there but there may be harm in removing them.

Don't try to fix something that's working until your damn sure you're not breaking something else.

The reason I mention this is because we inherited a legacy reporting application that had exactly this construct, along the lines of:

where id = id

And, being a sensible fellow, I ditched it, only to discover that the database engine wasn't the only thing using the query.

It first went through a pre-processor which extracted every column that existed in a where clause and ensured they were indexed. Basically an auto-tuning database.

Well, imagine our surprise on the next iteration when the database slowed to a fraction of its former speed when users were doing ad-hoc queries on the id field :-)

Turns out this was a kludge put in by the previous support team to ensure common ad-hoc queries were using indexed columns as well, even though none of our standard queries did so.

So, I'm not saying you can't do it, just suggesting that it might be a good idea to understand why it was put in first.

  1. Yes, the conditions are obviously redundant since they're identical!

    SELECT * FROM table1 t1,table2 t2
    WHERE t1.col1=t2.col1
    
  2. But you do need at least one of them. Otherwise, you'd have a cartesian join on your hands: each row from table1 will be joined to every row in table2. If table1 had 100 rows, and table2 had 1,000 rows, the resultant query would return 100,000 results.

    SELECT * FROM table1 t1,table2 t2 --warning!
    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top