Question

I want to get a list of email domains and the top user within each domain. My approach is to sum the questions per email grouped by the domain and then get the top user with a window function. However this does not work:

SELECT 
  domain,
  sum(questions_per_email) as questions_per_domain,
  first_value(email) OVER (PARTITION BY domain ORDER BY questions_per_email DESC) as top_user
FROM (
    SELECT email,
           lower(substring(u.email from position('@' in u.email)+1)) as domain,
           count(*) as questions_per_email
      FROM questions q
      JOIN identifiers i ON (q.owner_id = i.id)
      JOIN users u ON (u.identifier_id = i.id)
    GROUP BY email
  ) as per_user
GROUP BY domain, top_user

And Postgres gives the following message:

ERROR:  column "per_user.questions_per_email" must appear in the GROUP BY clause or be used in an aggregate function
LINE 5: ...t_value(email) OVER (PARTITION BY domain ORDER BY questions_...
                                                             ^

I can't really see why this is. I pretty sure that one should be able to use a window function on the aggregated result. Please advice!

Thanks, Kristoffer

Was it helpful?

Solution

you can change your query like this:

with cte1 as (
    SELECT email,
           lower(substring(u.email from position('@' in u.email)+1)) as domain
      FROM questions q
      JOIN identifiers i ON (q.owner_id = i.id)
      JOIN users u ON (u.identifier_id = i.id)
), cte2 as (
    select
        domain, email,
        count(*) as questions_per_email,
        first_value(email) over (partition by domain order by count(*) desc) as top_user
    from cte1
    group by email, domain
)
select domain, top_user, sum(questions_per_email) as questions_per_domain
from cte2
group by domain, top_user

sql fiddle demo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top