Question

On a table with over 10 million rows, which one of these UPDATE statements likely to faster, or are they much of a muchness?

  1. UPDATE table SET col1 = col2 WHERE col1 IS NULL;
  2. UPDATE table SET col1 = col2 WHERE col1 IS NULL AND col2 IS NOT NULL;

Neither col1 nor col2 are keys or indexed. They are actually datetime fields, if that changes anything.


Note: I'm fairly sure that the queries achieve the same result:

Query 1. will update col1 every time, including setting it to NULL if col2 is NULL.

Query 2. will update col1 only if col2 has a value, thereby leaving col1 as NULL if col2 is NULL.

They both change the data in the same way. One sets col1 to NULL when col2 is NULL, the other leaves col1 as NULL when col2 is NULL, but in all cases col1 was NULL to begin with.

Was it helpful?

Solution 4

In response to the various comments & answers about measuring rather than guessing, and also to the comments regarding the count of rows where col2 is NULL vs col2 having a value of some kind, I ran 4 tests on a test table:

                            | Query 1          | Query 2
 col2 almost all NULL       |  5 min 27.96 sec |  4 min 20.38 sec
 col2 almost all has values | 34 min 31.64 sec | 32 min 31.54 sec

Conclusion: Query 2 is marginally faster in both cases.

OTHER TIPS

Performance is better measured than guessed. Yet here is my guess: It heavily depends on your usage of indeces, and their cardinality.

That means: if col2 has an index with thousands of different values and null beeing one of them, it´ll be faster than query 1. If there is no query and col2 almost always is null, it will be slower.

Best performance will be achieved if you have a two-column-index spanning both cols, and use both cols in the where-part.

It actually depends on different factors.

UPDATE table SET col1 = col2;

Will do a full table scan and modification in any case.

These queries

UPDATE table SET col1 = col2 WHERE col1 IS NULL;
UPDATE table SET col1 = col2 WHERE col1 IS NULL AND col2 IS NOT NULL;

will result in faster operation, if col1 and col2 are indexed.

You can get information about what is done in which case if you execute

EXPLAIN UPDATE table SET col1 = col2;
EXPLAIN UPDATE table SET col1 = col2 WHERE col1 IS NULL;
EXPLAIN UPDATE table SET col1 = col2 WHERE col1 IS NULL AND col2 IS NOT NULL;

In my experience the second format is usually preferable, but of course it will depend on your data. It is important to note that indexes on the columns in question may not include null values, depending on the database provider and the way that the indexes are defined. Even if indexes are defined, they may not improve performance, and depending on your statistics they may not be used at all.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top