Question

We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:

delete all from OUR_TABLE where ID in (123, 345, ...)

The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this? (We are using sybase - if that matters).

Was it helpful?

Solution

Consider running this in batches. A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.

If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. You'll have to experiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).

But in any even I would put my key values that I intend to delete into a temp table and delete from there.

OTHER TIPS

There are two ways to make statements like this one perform:

  1. Create a new table and copy all but the rows to delete. Swap the tables afterwards (alter table name ...) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.

  2. Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.

I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?

Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for IN clause. For example, Oracle have limit around 1000.

Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:

  DELETE FROM OUR_TABLE WHERE ID IN 
        (SELECT ID FROM somewhere WHERE some_condition)

Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:

  1. you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.

    • drop all indexes on table
    • delete records
    • recreate indexes
    • if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. Delete will go much faster because database won't be checking integrity. Enable constraints after delete.
    • disable integrity constraints, disable check constraints
    • delete records
    • enable constraints
    • disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.

    • last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.

I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.

Find out what is using up the performance!

In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):

  • Do you have foreign keys on that table? Makes sure the referring ids are indexed
  • Do you have indexes on that table? It might be that droping before delete and recreating after the delete might be faster.
  • check the execution plan. Is it using an index where a full table scan might be faster? Or the other way round? HINTS might help
  • instead of a select into new_table as suggested above a create table as select might be even faster.

But remember: Find out what is using up the performance first.

When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.

Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.

Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.

You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.

I also think that the temp table is likely the best solution.

If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. I thus suggest that you delete using a join - many people don't know about that functionality.

So, given this example table:

    -- set up tables for this example
    if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
        drop table OurTable
    go

    create table OurTable (ID integer primary key not null)
    go
    insert into OurTable (ID) values (1)
    insert into OurTable (ID) values (2)
    insert into OurTable (ID) values (3)
    insert into OurTable (ID) values (4)
    go

We can then write our delete code as follows:

    create table #IDsToDelete (ID integer not null)
    go
    insert into #IDsToDelete (ID) values (2)
    insert into #IDsToDelete (ID) values (3)
    go
    -- ... etc ...
    -- Now do the delete - notice that we aren't using 'from'
    -- in the usual place for this delete
    delete OurTable from #IDsToDelete
       where OurTable.ID = #IDsToDelete.ID
    go
    drop table #IDsToDelete
    go
    -- This returns only items 1 and 4
    select * from OurTable order by ID
    go

Does our_table have a reference on delete cascade?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top