Question

I want to select all users that have either a post, a photo or a video. I came up with the following query.

select
  users.username
from
  users
where
  exists (select * from posts where users.id = posts.user_id)
  or exists (select * from photos where users.id = photos.user_id)
  or exists (select * from videos where users.id = videos.user_id)

Everything works perfectly, but I also need to determine if a user has a particular record on each table, so I'm using:

select
  users.username,
  EXISTS (select * from posts where users.id = posts.user_id) as has_post,
  EXISTS (select * from photos where users.id = photos.user_id) as has_photo,
  EXISTS (select * from videos where users.id = videos.user_id) as has_video
from
  users
where
  exists (select * from posts where users.id = posts.user_id)
  or exists (select * from photos where users.id = photos.user_id)
  or exists (select * from videos where users.id = videos.user_id)

The query above works but it's slow, how can I optimize it? I'm also open to other alternatives.

All id columns are primary keys and all user_id columns are foreign keys, users.username is a unique column.


Update

The execution plans below are executed in a development environment with much less data but same database schema (I can feel the slowness in production too, but I don't have access)

Seq Scan on users  (cost=0.00..79803.74 rows=3883 width=18) (actual time=0.799..7.069 rows=297 loops=1)
  Filter: ((alternatives: SubPlan 1 or hashed SubPlan 2) OR (alternatives: SubPlan 3 or hashed SubPlan 4) OR (alternatives: SubPlan 5 or hashed SubPlan 6))
  Rows Removed by Filter: 4141
  Buffers: shared hit=146
  SubPlan 1
    ->  Seq Scan on posts  (cost=0.00..17.11 rows=2 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 2
    ->  Seq Scan on posts posts_1  (cost=0.00..16.29 rows=329 width=4) (actual time=0.007..0.120 rows=329 loops=1)
          Buffers: shared hit=13
  SubPlan 3
    ->  Seq Scan on photos  (cost=0.00..5.49 rows=1 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 4
    ->  Seq Scan on photos photos_1  (cost=0.00..5.19 rows=119 width=4) (actual time=0.008..0.037 rows=119 loops=1)
          Buffers: shared hit=4
  SubPlan 5
    ->  Seq Scan on videos  (cost=0.00..7.80 rows=2 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 6
    ->  Seq Scan on videos videos_1  (cost=0.00..7.04 rows=304 width=4) (actual time=0.009..0.066 rows=304 loops=1)
          Buffers: shared hit=4
Planning Time: 1.229 ms
Execution Time: 7.296 ms

The second query

Seq Scan on users  (cost=0.00..149479.32 rows=3883 width=21) (actual time=354.809..368.271 rows=297 loops=1)
  Filter: ((alternatives: SubPlan 7 or hashed SubPlan 8) OR (alternatives: SubPlan 9 or hashed SubPlan 10) OR (alternatives: SubPlan 11 or hashed SubPlan 12))
  Rows Removed by Filter: 4141
  Buffers: shared hit=167
  SubPlan 1
    ->  Seq Scan on posts  (cost=0.00..17.11 rows=2 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 2
    ->  Seq Scan on posts posts_1  (cost=0.00..16.29 rows=329 width=4) (actual time=16.649..16.776 rows=329 loops=1)
          Buffers: shared hit=13
  SubPlan 3
    ->  Seq Scan on photos  (cost=0.00..5.49 rows=1 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 4
    ->  Seq Scan on photos photos_1  (cost=0.00..5.19 rows=119 width=4) (actual time=12.576..12.634 rows=119 loops=1)
          Buffers: shared hit=4
  SubPlan 5
    ->  Seq Scan on videos  (cost=0.00..7.80 rows=2 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 6
    ->  Seq Scan on videos videos_1  (cost=0.00..7.04 rows=304 width=4) (actual time=12.815..13.606 rows=304 loops=1)
          Buffers: shared hit=4
  SubPlan 7
    ->  Seq Scan on posts posts_2  (cost=0.00..17.11 rows=2 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 8
    ->  Seq Scan on posts posts_3  (cost=0.00..16.29 rows=329 width=4) (actual time=15.300..15.822 rows=329 loops=1)
          Buffers: shared hit=13
  SubPlan 9
    ->  Seq Scan on photos photos_2  (cost=0.00..5.49 rows=1 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 10
    ->  Seq Scan on photos photos_3  (cost=0.00..5.19 rows=119 width=4) (actual time=14.130..14.184 rows=119 loops=1)
          Buffers: shared hit=4
  SubPlan 11
    ->  Seq Scan on videos videos_2  (cost=0.00..7.80 rows=2 width=0) (never executed)
          Filter: (users.id = user_id)
  SubPlan 12
    ->  Seq Scan on videos videos_3  (cost=0.00..7.04 rows=304 width=4) (actual time=18.024..18.103 rows=304 loops=1)
          Buffers: shared hit=4
Planning Time: 1.567 ms
JIT:
  Functions: 106
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 49.689 ms, Inlining 0.000 ms, Optimization 15.154 ms, Emission 312.171 ms, Total 377.014 ms
Execution Time: 471.659 ms
Was it helpful?

Solution

The 2nd query is activating JIT (Just-In-Time compilation), which is very counterproductive for you. 312 out of 472 ms is spent in the JIT. If you don't know that you benefit from JIT for other plans, then I would just turn JIT off globally. I think it was a mistake that they turned JIT on by default in version 12, as it seems to hurt at least as many cases as it helps. If you don't want to turn it off globally, then you could play with some of the other JIT parameters.

Also, I wouldn't think the plans you posted match your posted queries. Did you simplify your queries before posting?

OTHER TIPS

You might try:

select u.username
     , count(x.user_id) as has_post
     , count(y.user_id) as has_photo
     , count(z.user_id) as has_video
from users u
left join lateral ( 
    select p.user_id 
    from posts p
    where u.id = p.user_id
    limit 1
) as x
left join lateral ( 
    select p.user_id 
    from photos p
    where u.id = p.user_id
    limit 1
) as y
left join lateral ( 
    select v.user_id 
    from videos v
    where u.id = v.user_id
    limit 1
) as z
group by u.username;
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top