How to Choose Between VALUES and SELECT for INSERT?

https://dba.stackexchange.com/questions/107055

28-09-2020
|

Domanda

This answer raised the question for me how to choose between VALUES and SELECT in such a function . Using PostgreSQL 9.4.3 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit:

CREATE OR REPLACE FUNCTION insaft_function()
   RETURNS TRIGGER AS
$func$
BEGIN     
   INSERT INTO file_headers (measurement_id, file_header_index_start
                                           , file_header_index_end)
   VALUES (NEW.measurement_id, TG_ARGV[0]::int, TG_ARGV[1]::int);

   RETURN NULL;  -- result ignored since this is an AFTER trigger
END
$func$ LANGUAGE plpgsql;

VALUES work with many rows but with SELECT you can do much more. The only requirement here is to do the above INSERT to the table. You can assume that there are 100k of such INSERTs done per cycle in continuous quality assurance of a system.

I noticed these differences with my data where selected three median values here:

VALUES
real      user      sys
-------------------------------
0m0.353s  0m0.256s  0m0.028s
0m0.327s  0m0.252s  0m0.036s
0m0.358s  0m0.252s  0m0.040s
so average real 0.34s

SELECT
real      user      sys
-------------------------------
0m0.362s  0m0.256s  0m0.024s
0m0.383s  0m0.236s  0m0.056s
0m0.356s  0m0.264s  0m0.032s
so average real 0.36s

So this small subset of data says that VALUES is faster with such a simple INSERT. I am interested in requirements for concurrent processes and real time data analysis.

How can you decide between SELECT and VALUES for INSERT?

Soluzione

The measured difference is almost certainly noise. Run some more iterations, you won't get consistent result. The difference in performance (if any exists) won't be measurable.

You can use either method here. Both are equally good for the purpose. There are often multiple ways in SQL. And sometimes there is no clear winner.

The more important issue concerning performance here:

100k of such INSERTs done per cycle

For big bulk-inserts it would be faster to INSERT in both tables instead of firing a trigger for every row.

If you are using a serial PK that's generated automatically, you can employ the RETURNING clause in a data-modifying CTE

WITH ins1 AS (
   INSERT INTO measurement (measurement)
   VALUES ...   -- OR SELECT ... if data comes from inside the DB :)
   RETURNING measurement_id  -- generating a serial ID?
   )
INSERT INTO file_headers (measurement_id, file_header_index_start, file_header_index_end)
SELECT measurement_id,  1, 666  -- here it *must* be SELECT
FROM   ins1;

A numerical constant like 666 (no quotes, just digits) defaults to type integer automatically.

May not be applicable, depending on your workflow.

Bulk insert in multiple tables

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange