Performing one-off calculations within a SQL query

Question

What's happening here is that extract is implemented using the date_part function:

regress=> explain select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
                                                                                                                                        QUERY PLAN                                                                                                                                         
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=30.02..30.03 rows=1 width=0)
   ->  Function Scan on generate_series x  (cost=0.00..30.00 rows=5 width=0)
         Filter: (((x)::double precision > (date_part('epoch'::text, '2013-08-10 22:00:00+08'::timestamp with time zone) * 1000::double precision)) AND ((x)::double precision < (date_part('epoch'::text, '2013-08-11 22:00:00+08'::timestamp with time zone) * 1000::double precision)))
(3 rows)

date_part(text, timestamptz) is defined as stable not immutable:

regress=> \df+ date_part
                                                                                                                 List of functions
   Schema   |   Name    | Result data type |        Argument data types        |  Type  | Volatility |  Owner   | Language |                               Source code                                |                 Description                 
------------+-----------+------------------+-----------------------------------+--------+------------+----------+----------+--------------------------------------------------------------------------+---------------------------------------------
 ...
 pg_catalog | date_part | double precision | text, timestamp with time zone    | normal | stable     | postgres | internal | timestamptz_part                                                         | extract field from timestamp with time zone
 ...

and I'm pretty sure that'll prevent Pg from pre-computing the value and inlining it into the call. I'd need to dig deeper to be sure.

I believe the reasoning is that date_part on a timestamptz can be dependent on the value of the TimeZone setting. This isn't true for date_part('epoch', some_timestamptz) but the query planner doesn't understand at planning time that you're using that.

I'm still surprised that it doesn't get pre-computed, as the documentation states:

A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call.

You can work around this apparent limitation by converting to a timestamp at UTC (or whatever the TZ the epoch of your times is in) first, with AT TIME ZONE 'UTC'. E.g.:

select count(1) 
from generate_series(1376143200000,1376143200000+1000000) x 
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 
and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;

This executes faster, though there's more time difference than I'd expect if it were just being calculated once:

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
  count  
---------
 1000000
(1 row)

Time: 767.629 ms

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;
  count  
---------
 1000000
(1 row)

Time: 373.453 ms

regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > 1376143200000 and x <  1376229600000;
  count  
---------
 1000000
(1 row)

Time: 324.557 ms

It would be possible to remove this query optimizer limitation / add a feature to optimize this. The optimizer would need to recognize, probably at parse time, that extract('epoch', ...) is a special case and instead of invoking date_part('epoch, ...) invoke a special timestamptz_epoch(...) function that was immutable.

A bit of looking at perf top results shows that the timestamptz case has the following peaks:

 10.33%  postgres      [.] ExecMakeFunctionResultNoSets
  7.76%  postgres      [.] timesub.isra.1
  6.94%  postgres      [.] datebsearch
  5.58%  postgres      [.] timestamptz_part
  3.82%  postgres      [.] AllocSetAlloc
  2.97%  postgres      [.] ExecEvalConst
  2.68%  postgres      [.] downcase_truncate_identifier
  2.38%  postgres      [.] ExecEvalScalarVarFast
  2.23%  postgres      [.] slot_getattr
  1.99%  postgres      [.] DatumGetFloat8

wheras with the use of AT TIME ZONE we get:

 11.58%  postgres      [.] ExecMakeFunctionResultNoSets
  4.28%  postgres      [.] AllocSetAlloc
  4.18%  postgres      [.] ExecProject
  3.82%  postgres      [.] slot_getattr
  2.99%  libc-2.17.so  [.] __memmove_ssse3
  2.96%  postgres      [.] BufFileWrite
  2.80%  libc-2.17.so  [.] __memcpy_ssse3_back
  2.74%  postgres      [.] BufFileRead
  2.69%  postgres      [.] float8lt

and with the integer case:

  7.92%  postgres      [.] ExecMakeFunctionResultNoSets
  5.36%  postgres      [.] slot_getattr
  4.52%  postgres      [.] AllocSetAlloc
  4.02%  postgres      [.] ExecProject
  3.42%  libc-2.17.so  [.] __memmove_ssse3
  3.33%  postgres      [.] BufFileWrite
  3.31%  libc-2.17.so  [.] __memcpy_ssse3_back
  2.91%  postgres      [.] BufFileRead
  2.90%  postgres      [.] GetMemoryChunkSpace
  2.67%  postgres      [.] AllocSetFree

So you can see that the AT TIME ZONE version avoids the repeated timestamptz_part and datebsearch calls. The main difference between it and the integer case is float8lt; it looks like we're doing double precision comparisons instead of integer comparisons.

Sure enough, a cast takes care of it:

select count(1) 
from generate_series(1376143200000,1376143200000+1000000) x
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000  
and x <  extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000;

I don't have the time to pursue the enhancement to the optimizer discussed above at present, but it's something you might want to consider raising on the mailing list.