What's happening here is that extract
is implemented using the date_part
function:
regress=> explain select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=30.02..30.03 rows=1 width=0)
-> Function Scan on generate_series x (cost=0.00..30.00 rows=5 width=0)
Filter: (((x)::double precision > (date_part('epoch'::text, '2013-08-10 22:00:00+08'::timestamp with time zone) * 1000::double precision)) AND ((x)::double precision < (date_part('epoch'::text, '2013-08-11 22:00:00+08'::timestamp with time zone) * 1000::double precision)))
(3 rows)
date_part(text, timestamptz)
is defined as stable
not immutable
:
regress=> \df+ date_part
List of functions
Schema | Name | Result data type | Argument data types | Type | Volatility | Owner | Language | Source code | Description
------------+-----------+------------------+-----------------------------------+--------+------------+----------+----------+--------------------------------------------------------------------------+---------------------------------------------
...
pg_catalog | date_part | double precision | text, timestamp with time zone | normal | stable | postgres | internal | timestamptz_part | extract field from timestamp with time zone
...
and I'm pretty sure that'll prevent Pg from pre-computing the value and inlining it into the call. I'd need to dig deeper to be sure.
I believe the reasoning is that date_part
on a timestamptz
can be dependent on the value of the TimeZone
setting. This isn't true for date_part('epoch', some_timestamptz)
but the query planner doesn't understand at planning time that you're using that.
I'm still surprised that it doesn't get pre-computed, as the documentation states:
A
STABLE
function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call.
You can work around this apparent limitation by converting to a timestamp at UTC (or whatever the TZ the epoch of your times is in) first, with AT TIME ZONE 'UTC'
. E.g.:
select count(1)
from generate_series(1376143200000,1376143200000+1000000) x
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000
and x < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;
This executes faster, though there's more time difference than I'd expect if it were just being calculated once:
regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10')*1000 and x < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10')*1000;
count
---------
1000000
(1 row)
Time: 767.629 ms
regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')*1000 and x < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')*1000;
count
---------
1000000
(1 row)
Time: 373.453 ms
regress=> select count(1) from generate_series(1376143200000,1376143200000+1000000) x where x > 1376143200000 and x < 1376229600000;
count
---------
1000000
(1 row)
Time: 324.557 ms
It would be possible to remove this query optimizer limitation / add a feature to optimize this. The optimizer would need to recognize, probably at parse time, that extract('epoch', ...)
is a special case and instead of invoking date_part('epoch, ...)
invoke a special timestamptz_epoch(...)
function that was immutable.
A bit of looking at perf top
results shows that the timestamptz case has the following peaks:
10.33% postgres [.] ExecMakeFunctionResultNoSets
7.76% postgres [.] timesub.isra.1
6.94% postgres [.] datebsearch
5.58% postgres [.] timestamptz_part
3.82% postgres [.] AllocSetAlloc
2.97% postgres [.] ExecEvalConst
2.68% postgres [.] downcase_truncate_identifier
2.38% postgres [.] ExecEvalScalarVarFast
2.23% postgres [.] slot_getattr
1.99% postgres [.] DatumGetFloat8
wheras with the use of AT TIME ZONE
we get:
11.58% postgres [.] ExecMakeFunctionResultNoSets
4.28% postgres [.] AllocSetAlloc
4.18% postgres [.] ExecProject
3.82% postgres [.] slot_getattr
2.99% libc-2.17.so [.] __memmove_ssse3
2.96% postgres [.] BufFileWrite
2.80% libc-2.17.so [.] __memcpy_ssse3_back
2.74% postgres [.] BufFileRead
2.69% postgres [.] float8lt
and with the integer case:
7.92% postgres [.] ExecMakeFunctionResultNoSets
5.36% postgres [.] slot_getattr
4.52% postgres [.] AllocSetAlloc
4.02% postgres [.] ExecProject
3.42% libc-2.17.so [.] __memmove_ssse3
3.33% postgres [.] BufFileWrite
3.31% libc-2.17.so [.] __memcpy_ssse3_back
2.91% postgres [.] BufFileRead
2.90% postgres [.] GetMemoryChunkSpace
2.67% postgres [.] AllocSetFree
So you can see that the AT TIME ZONE
version avoids the repeated timestamptz_part
and datebsearch
calls. The main difference between it and the integer case is float8lt
; it looks like we're doing double precision
comparisons instead of integer comparisons.
Sure enough, a cast takes care of it:
select count(1)
from generate_series(1376143200000,1376143200000+1000000) x
where x > extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-11 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000
and x < extract(EPOCH FROM TIMESTAMP WITH TIME ZONE '2013-08-12 00:00:00+10' AT TIME ZONE 'UTC')::bigint * 1000;
I don't have the time to pursue the enhancement to the optimizer discussed above at present, but it's something you might want to consider raising on the mailing list.