TimescaleDB wildcard (%) slow

https://dba.stackexchange.com/questions/287098

17-03-2021
|

Question

I have a timescaledb hypertable like this:

create table logs
(
    time         timestamp not null,
    partitionkey text      not null,
    ip           inet,
    raw          text,
    transformed  double precision
);

And indexes as follows:

create index logs_time_idx
    on logs (time desc);

create unique index logs_partitionkey_time_uindex
    on logs (partitionkey asc, time desc);

When I run this query, it takes 20 minutes to complete:

SELECT * FROM data.logs 
WHERE partitionkey LIKE '%m.60.05482730' 
AND time > NOW() - INTERVAL '3 days'

But when I run this one, it takes 2 seconds:

SELECT * FROM data.logs 
WHERE partitionkey LIKE '865617033605366.m.60.05482730'
AND time > NOW() - INTERVAL '3 days'

I tried indexing only the partitionkeys to help the wildcard query find matching values, but that had no effect.

-- created this index later to try and fix the slow wildcard query
create index logs_partitionkey_index
    on logs (partitionkey);

Explain plan for wildcard query:

Gather  (cost=1000.57..525711.89 rows=1219 width=81)
  Workers Planned: 2
  ->  Parallel Custom Scan (ChunkAppend) on logs  (cost=0.57..524589.99 rows=509 width=82)
        Chunks excluded during startup: 2
        ->  Parallel Index Scan using _hyper_2_10_chunk_logs_time_idx on _hyper_2_10_chunk  (cost=0.57..263956.91 rows=255 width=81)
              Index Cond: ("time" > (now() - '3 days'::interval))
              Filter: (partitionkey ~~ '%m.60.05482730'::text)
        ->  Parallel Index Scan using _hyper_2_9_chunk_logs_time_idx on _hyper_2_9_chunk  (cost=0.57..260629.72 rows=252 width=83)
              Index Cond: ("time" > (now() - '3 days'::interval))
              Filter: (partitionkey ~~ '%m.60.05482730'::text)
JIT:
  Functions: 8
  Options: Inlining true, Optimization true, Expressions true, Deforming true

Explain for specific partionkey value:

Custom Scan (ChunkAppend) on logs  (cost=0.44..903.08 rows=790 width=82)
  Chunks excluded during startup: 2
  ->  Index Scan using _hyper_2_9_chunk_logs_partitionkey_time_uindex on _hyper_2_9_chunk  (cost=0.57..447.44 rows=392 width=83)
        Index Cond: ((partitionkey = '865617033605366.m.60.05482730'::text) AND ("time" > (now() - '3 days'::interval)))
        Filter: (partitionkey ~~ '865617033605366.m.60.05482730'::text)
  ->  Index Scan using _hyper_2_10_chunk_logs_partitionkey_time_uindex on _hyper_2_10_chunk  (cost=0.57..452.27 rows=396 width=81)
        Index Cond: ((partitionkey = '865617033605366.m.60.05482730'::text) AND ("time" > (now() - '3 days'::interval)))
        Filter: (partitionkey ~~ '865617033605366.m.60.05482730'::text)

Is TimescaleDB not able to do wildcard (%) queries, or do I miss an index?

Solution

A B-Tree index can't be used for a LIKE condition with a right-anchored wildcard. '%...' it can only be used for a left-anchored wildcard '...%'. You would need a trigram index to improved that.

If the length of your substring search is always the same, you could create an index on that expression. Including the time column in that index would probably help as well

create index logs_partitionkey_index
    on logs ( (right(partitionkey,13), "time" );

And change your query to:

SELECT * 
FROM data.logs 
WHERE right(partitionkey,13) = 'm.60.05482730' 
  AND "time" > NOW() - INTERVAL '3 days'

Alternatively, create an index on the reversed string:

create index logs_partitionkey_index
    on logs ( (reverse(partitionkey) varchar_pattern_ops);

Then change your query to:

SELECT * 
FROM data.logs 
WHERE reverse(partitionkey) like reverse('m.60.05482730')||'%'
  AND "time" > NOW() - INTERVAL '3 days'

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange