PostgreSQL - Correlated Sub-Query Fail?

https://stackoverflow.com/questions/426221

06-07-2019
|

Question

I have a query like this:

SELECT t1.id,
    (SELECT COUNT(t2.id)
     FROM t2
     WHERE t2.id = t1.id
          ) as num_things
FROM t1
WHERE num_things = 5;

The goal is to get the id of all the elements that appear 5 times in the other table. However, I get this error:

ERROR: column "num_things" does not exist
SQL state: 42703

I'm probably doing something silly here, as I'm somewhat new to databases. Is there a way to fix this query so I can access num_things? Or, if not, is there any other way of achieving this result?

Solution

I think you could just rewrite your query like so:

SELECT t1.id
FROM t1
WHERE (SELECT COUNT(t2.id)
     FROM t2
     WHERE t2.id = t1.id
          ) = 5;

OTHER TIPS

A few important points about using SQL:

You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got.
You can do your count better using a JOIN and GROUP BY than by using correlated subqueries. It'll be much faster.
Use the HAVING clause to filter groups.

Here's the way I'd write this query:

SELECT t1.id, COUNT(t2.id) AS num_things
FROM t1 JOIN t2 USING (id)
GROUP BY t1.id
HAVING num_things = 5;

I realize this query can skip the JOIN with t1, as in Charles Bretana's solution. But I assume you might want the query to include some other columns from t1.

Re: the question in the comment:

The difference is that the WHERE clause is evaluated on rows, before GROUP BY reduces groups to a single row per group. The HAVING clause is evaluated after groups are formed. So you can't, for example, change the COUNT() of a group by using HAVING; you can only exclude the group itself.

SELECT t1.id, COUNT(t2.id) as num
FROM t1 JOIN t2 USING (id)
WHERE t2.attribute = <value>
GROUP BY t1.id
HAVING num > 5;

In the above query, WHERE filters for rows matching a condition, and HAVING filters for groups that have at least five count.

The point that causes most people confusion is when they don't have a GROUP BY clause, so it seems like HAVING and WHERE are interchangeable.

WHERE is evaluated before expressions in the select-list. This may not be obvious because SQL syntax puts the select-list first. So you can save a lot of expensive computation by using WHERE to restrict rows.

SELECT <expensive expressions>
FROM t1
HAVING primaryKey = 1234;

If you use a query like the above, the expressions in the select-list are computed for every row, only to discard most of the results because of the HAVING condition. However, the query below computes the expression only for the single row matching the WHERE condition.

SELECT <expensive expressions>
FROM t1
WHERE primaryKey = 1234;

So to recap, queries are run by the database engine according to series of steps:

Generate set of rows from table(s), including any rows produced by JOIN.
Evaluate WHERE conditions against the set of rows, filtering out rows that don't match.
Compute expressions in select-list for each in the set of rows.
Apply column aliases (note this is a separate step, which means you can't use aliases in expressions in the select-list).
Condense groups to a single row per group, according to GROUP BY clause.
Evaluate HAVING conditions against groups, filtering out groups that don't match.
Sort result, according to ORDER BY clause.

All the other suggestions would work, but to answer your basic question it would be sufficient to write

  SELECT id  From T2
  Group By Id
  Having Count(*) = 5

I'd like to mention that in PostgreSQL there is no way to use aliased column in having clause.

i.e.

SELECT usr_id AS my_id FROM user HAVING my_id = 1

Wont work.

Another example that is not going to work:

SELECT su.usr_id AS my_id, COUNT(*) AS val FROM sys_user AS su GROUP BY su.usr_id HAVING val >= 1

There will be the same error: val column is not known.

Im highliting this because Bill Karwin wrote something not really true for Postgres:

"You cannot use column aliases in the WHERE clause, but you can in the HAVING clause. That's the cause of the error you got."

try this

SELECT t1.id,
    (SELECT COUNT(t2.id) as myCount
     FROM t2
     WHERE t2.id = t1.id and myCount=5
          ) as num_things
FROM t1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow