Question

I have a small application where I need to update table with 40000 rows using another table with 40000 rows each day. This action merges data from different (external) data sources for report generation in the company I work in and this is the only method available to me at this time :(

Right now I use a query formated like

UPDATE table1, table2 SET table1.column1=table2.column1 WHERE table1.column2=table2.column2

and it takes huge amounts of time to complete. This is comparing 40k to 40k so it gives like 1600000 of comparisons to get done. If it is possible can I create a query that will instruct SQL to remove rows from the job on match? So 40k of rows drops by one on each match/update.

I can reproduce that by copying original tables to temporary ones and remove rows with the same key after updating result table but perhaps there is a more elegant and/or faster method of doing that :)

Thanks for any insights!

/edit - correct - it should be 'UPDATE' rather than 'SELECT' :)

Was it helpful?

Solution

To remove rows from a table, that would require a DELETE statement; and that would make things a lot slower, not faster.

To improve performance of the UPDATE, consider adding appropriate indexes. Likely the best candidate is a covering index:

... ON table2 (column2, column1)

That would make "matches" (lookups of the value of column2) much faster. With the value of column1 available in the index, that value can be returned directly from the index rather than requiring another lookup of the row in a page in the underlying table.

You have the right idea, about reducing the number of comparison operations that need to be done. That's the raison d'etre for indexes; they make the comparison operations much faster, by significantly reducing the number of comparison that need to be performed. The index is organized in a way that eliminates the vast majority of comparisons; we don't need to compare to every value in every row, the index organizes the values in a way that the database can quickly determine that there entire swaths of rows don't need to be checked, because the database knows it's impossible for any rows in that swath to match the value its looking for.


I expect you meant that you are running an UPDATE statement, not a SELECT.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top