Question

We are migrating our ids from INTEGER to BIGINT in some of our tables. The first step for us is to create a new temporary column id2 that has the correct type, and then we want to migrate the existing id into the new id2 column.

We want to do this in batches and one of the easiest ways I found is to copy 10k ids into a temp table and use that for the update.

However. When I run an EXPLAIN ANALYZE I can see that fetching the data is very fast, but actually updating the rows takes a lot of time.

This is (basically) what we do in a loop (@LastInsertId is updated after each loop):

CREATE TEMP TABLE tmp_id_table (
    id BIGINT NOT NULL,
    CONSTRAINT tempTable_pkey PRIMARY KEY (id)
);

INSERT INTO tmp_id_table (id) SELECT id FROM items WHERE id > @LastInsertId ORDER BY id ASC LIMIT 10000;

explain (analyze, buffers, timing, costs)
UPDATE items i SET id2 = tit.id 
FROM tmp_id_table tit WHERE i.id = tit.id;

TRUNCATE TABLE tmp_id_table;

Here is the execution plan:

Update on items i  (cost=0.57..87476.70 rows=10170 width=529) (actual time=17719.509..17719.509 rows=0 loops=1)
  Buffers: shared hit=461836 read=11997, local hit=45
  ->  Nested Loop  (cost=0.57..87476.70 rows=10170 width=529) (actual time=0.030..97.541 rows=10000 loops=1)
        Buffers: shared hit=60110 read=1, local hit=45
        ->  Seq Scan on tmp_id_table tit  (cost=0.00..146.70 rows=10170 width=14) (actual time=0.010..8.241 rows=10000 loops=1)
              Buffers: local hit=45
        ->  Index Scan using items_pkey on items i  (cost=0.57..8.58 rows=1 width=515) (actual time=0.006..0.008 rows=1 loops=10000)
              Index Cond: (id = tit.id)
              Buffers: shared hit=60110 read=1
Planning time: 1.083 ms
Execution time: 17719.564 ms

We are running PostgreSQL 9.6 on AWS Aurora. We have a writer instance that is replicated to a reader instance. They are both db.r4.4xlarge.

The table we are updating has about 400M rows.

Are the optimizations that can be done? Can be tweak some parameters of the postgres instance? ~500 updates/second seems a bit slow for such a powerful instance.

EDIT

Added the output of explain (analyze, buffers, timing, costs, verbose) with track_io_timing enabled. This is another batch, so it's not run on the exact same data as the previous example.

Update on public.items i  (cost=0.57..87488.70 rows=10170 width=529) (actual time=43024.584..43024.584 rows=0 loops=1)
  Buffers: shared hit=432582 read=19294, local hit=45
  I/O Timings: read=40559.934
  ->  Nested Loop  (cost=0.57..87488.70 rows=10170 width=529) (actual time=1.293..1603.819 rows=10000 loops=1)
        Output: i.id, i.externalid, i.title, i.description, i.sourcename, i.sourceurl, i.sourceid, i.mediaurl, i.url, i.languagecode, i.published, i.entered, i.mediatype, i.fulltextprocessed, tit.id, i.ctid, tit.ctid
        Buffers: shared hit=49367 read=743, local hit=45
        I/O Timings: read=1468.798
        ->  Seq Scan on pg_temp_61.tmp_id_table tit  (cost=0.00..146.70 rows=10170 width=14) (actual time=0.021..10.367 rows=10000 loops=1)
              Output: tit.id, tit.ctid
              Buffers: local hit=45
        ->  Index Scan using items_pkey on public.items i  (cost=0.57..8.58 rows=1 width=515) (actual time=0.157..0.158 rows=1 loops=10000)
              Output: i.id, i.externalid, i.title, i.description, i.sourcename, i.sourceurl, i.sourceid, i.mediaurl, i.url, i.languagecode, i.published, i.entered, i.mediatype, i.fulltextprocessed, i.ctid
              Index Cond: (i.id = tit.id)
              Buffers: shared hit=49367 read=743
              I/O Timings: read=1468.798
Planning time: 0.771 ms
Execution time: 43024.659 ms

EDIT2

This is what the table looks like. Don't blame me for the column type choices.. ;)

CREATE TABLE public.items (
    id serial NOT NULL,
    externalid varchar(512) NOT NULL,
    title varchar(128) NULL,
    description varchar(512) NULL,
    sourcename text NULL,
    sourceurl text NULL,
    sourceid int8 NOT NULL,
    mediaurl text NULL,
    url text NOT NULL,
    languagecode varchar(2) NULL,
    published timestamp NOT NULL,
    entered timestamp NOT NULL DEFAULT timezone('utc'::text, now()),
    mediatype int4 NOT NULL DEFAULT 0,
    fulltextprocessed bool NULL,
    id2 int8 NULL,
    CONSTRAINT items_pkey PRIMARY KEY (id)
)
WITH (
    OIDS=FALSE
) ;
CREATE INDEX ix_items_entered ON public.items USING btree (entered DESC) ;
CREATE INDEX ix_items_externalid ON public.items USING btree (externalid) ;
CREATE INDEX ix_items_published_desc ON public.items USING btree (published DESC) ;
CREATE INDEX ix_items_sourceidmediatype ON public.items USING btree (sourceid, mediatype) ;
CREATE INDEX ix_items_sourcename ON public.items USING hash (sourcename) ;
CREATE INDEX ix_items_sourceurl ON public.items USING hash (sourceurl) ;
CREATE INDEX ix_items_url ON public.items USING btree (url) ;

And the index sizes:

ix_items_url                71 GB
ix_items_sourceurl          35 GB
ix_items_externalid         29 GB
ix_items_sourcename         23 GB
ix_items_sourceidmediatype  21 GB
ix_items_entered            18 GB
ix_items_published_desc     16 GB
Was it helpful?

Solution

HOT update solution

Since the "id2" column being updated is not itself indexed, your updates are eligible for HOT (heap-only-tuple) updates. This would spare you all of the work of maintaining the other indexes. However, in order for HOT updates to work, there needs to be room in the same data page for the new version of the tuple you are updating. If the old version and the new version can't both fit in the same page, you can't do HOT updates and so you end up doing all that index maintenance instead.

One way to encourage more HOT updates is to lower the fillfactor. However, that will not take effect on existing data until the table is rewritten. That won't work for you, as rewriting the table in one large operation is just what you are trying to avoid doing.

Also, if you do have HOT updates in a block, then the new version of the row gets charged against whatever free space did exist in that block. Meaning that additional updates targeting the same block will again find themselves out of space. So to be able to update all rows in a page HOT simultaneously would require fillfactor to be below 50, which is a lot of wasted space, plus only helps you in the future.

Once the old version of the old HOT update is old enough that no one cares about it, that space can be reused for new HOT update. However, multiple rows updated in the same statement can't possible do this, as it can't become "old enough" within its own transaction. Also, any open transaction from other connections (even of those connections never touch this table) will inhibit this reuse. So to encourage HOT updates when you don't have a very low fillfactor, you would want to arrange for your temp table to contain only one row from each page of the permanent table on any given invocation, and don't revisit the same page for long enough for the old version to age out, and make sure you don't have long-running transactions. If your pages are stuffed completely full, so that not even one more row-version fits, then the first update can never be HOT. But once one row per page is updated and the table is vacuumed, there should then be enough free space for one HOT update per block in the future.

Other solutions

  • Temporarily get enough RAM to hold all your indexes in memory simultaneously, and use pg_prewarm to pull them into memory (It could take a long time for them naturally accumulate into memory if you don't take explicit action).

  • Temporarily provision more IOPS.

  • Declare a maintenance window so you can drop indexes.

  • Grin and bear it. At this rate, it will only take 1 to 3 weeks to get it all done.

  • There may be some partitioning solutions, but that has the same problem as the fillfactor solution. Putting it into effect will likely require a maintenance window of the same length as the one you trying to avoid in the first place. Or, if you do it online, will take as much time as what you are currently doing.

  • With a deep knowledge of what kind of queries you are running operationally, you might be able to come up with partial indexes which cover your operational needs while focusing the update activity into smaller, more cacheable, sizes of the indexes.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top