PERFORM * vs PERFORM 1

Question 1

It should be noted, firstly, that PERFORM is not a SQL instruction, it's a plpgsql keyword that instructs the interpreter to run the equivalent SELECT and then discards the result.

Per documentation (Executing a Command With No Result):

PERFORM query;
This executes query and discards the result. Write the query the same way you would write an SQL SELECT command, but replace the initial keyword SELECT with PERFORM. For WITH queries, use PERFORM and then place the query in parentheses. (In this case, the query can only return one row.) PL/pgSQL variables will be substituted into the query just as for commands that return no result, and the plan is cached in the same way. Also, the special variable FOUND is set to true if the query produced at least one row, or false if it produced no rows

So the question is the same as: how do SELECT * FROM table and SELECT 1 FROM table compare to check if there's at least one row in a table? But the problem is they're both inadequate performance-wise, and in fact ludicrously inadequate if the table has many rows.

Let's test on a real example on the latest PostgreSQL 9.3 with two big tables:

words(int,text) (3.4 million lines, and the text column is always small)
inverted_word_index(int,int,bytea,int) (~10 million lines, the bytea being 400 bytes wide on average and 2kB maximum)

Queries are repeated a few consecutive times and I keep only the fastest execution with psql's \timing on

Test 1 with the first table

PERFORM 1 from words:

mlists=> do $$ begin perform  1 from words; end; $$;
DO
Time: 521,379 ms

PERFORM * from words:

mlists=> do $$ begin perform  * from words; end; $$;
DO
Time: 442,800 ms

Result 1: on this table, perform * seems consistently a bit faster than perform 1.

Test 2 with the 2nd table

PERFORM 1 from inverted_word_index:

mlists=> do $$ begin perform  1 from inverted_word_index ; end; $$;
DO
Time: 2206,230 ms

PERFORM * from inverted_word_index:

mlists=> do $$ begin perform  * from inverted_word_index ; end; $$;
DO
Time: 16848,971 ms

Result 2: for this table the best execution of perform * is much slower than the best execution of perform 1, so that's the opposite of the previous result.

Conclusion: No winner in general, it seems to depend on the table contents.

But the real interesting point is that both methods full-scan the table instead of stopping at the first line, so they're both way too slow.

Reasonably fast method:

mlists=> do $$ begin perform  1 from words limit 1; end; $$;
DO
Time: 0,330 ms

mlists=> do $$ begin perform  * from words limit 1; end; $$;
DO
Time: 0,405 ms

mlists=> do $$ begin perform  1 from inverted_word_index limit 1; end; $$;
DO
Time: 0,333 ms

mlists=> do $$ begin perform  * from inverted_word_index limit 1; end; $$;
DO
Time: 0,314 ms

Repeated executions show that the duration of both constructs varies between 0,3 ms and 0,4 ms with no winner. It's so low that the speed difference comes from unrelated dynamic factors.

Question 2

I'm not sure why you'd want to discard the result.

Personally I'd use something along the lines of:

select count(*)
into   rows_exist
from   (select * from my_table limit 1) t;

A single row with a value of 0 or 1 is guaranteed to be returned, and only a single row in the table is read in order to do that.

As to "1" vs "*" -- my instinct is to say that the optimiser is intelligent enough to only read the columns required in order to answer the query.

However if you run an explain with (verbose true) it suggests that all columns are projected from the subquery. If that's the case then it may be less efficient to use "select *", especially if a large number of rows have been deleted from the beginning of the table and the use of "select *" causes many blocks to be read in a full table scan before the first one is found. I'd welcome input from experts on PostgreSQL internals to see if this is the correct interpretation of the plan or not.