How to use FIRST_VALUE window function in MySQL/MariaDB to get one row per group?

https://dba.stackexchange.com/questions/281538

12-03-2021
|

Pergunta

I'm pleased to see window functions landed in MariaDB 10.2. I thought they would be good for first-of-group problems, but I'm struggling to see how they're efficient. I have something like this:

CREATE TABLE email (id INT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT,  
                    contact_id INT UNSIGNED NOT NULL, /* FK */            
                    email VARCHAR(200),                                   
                    is_primary TINYINT(1) UNSIGNED NOT NULL DEFAULT 0)    
;                                                                         
                                                                          
INSERT INTO email (id, contact_id, email, is_primary)  VALUES                 
  (1, 1, 'oldwilma@example.com', 0),                                         
  (2, 1, 'currentwilma@example.com', 1),                                     
  (3, 1, 'otherwilma@example.com', 0),
  (4, 2, '', 1),
  (5, 2, 'betty@example.com', 0),
;

I want a list of each contact and the best email for them. "Best" being defined as: prefer their is_primary one, if it exists.

I want this:

Contact ID     Email ID             Email
-------------- -------------------- ------------------
1              2                    currentwilma@example.com
2              5                    betty@example.com

Using Window functions, I can get the best email like

SELECT contact_id,
  FIRST_VALUE(email) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email,
  FIRST_VALUE(id) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email_id 
FROM email
WHERE email != ''
;                                                                                       

+------------+--------------------------+---------------+ 
| contact_id | best_email               | best_email_id | 
+------------+--------------------------+---------------+ 
|          1 | currentwilma@example.com |             2 |
|          1 | currentwilma@example.com |             2 |
|          1 | currentwilma@example.com |             2 | 
|          2 | betty@example.com        |             5 | 
+------------+--------------------------+---------------+

But I note that

With n valid (or at least, non empty) emails I get n rows output.
I had to copy the logic: partition by contact_id for each SELECT; this feels inefficient: if I had 12 other columns in the email table whose data I wanted, I'd be running this 12 times unless I just did it on the ID field and then INNER JOINED email again on that best ID.

So to get what I need I end up like this:

SELECT contact_id, MIN(best_email) best_email, MIN(best_email_id) best_email_id
FROM (                                        
  SELECT  contact_id,                                                                 
    FIRST_VALUE(email) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email,
    FIRST_VALUE(id) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email_id 
  FROM email                                  
  WHERE email != ''                                                                        
) q                                                                                        
GROUP BY contact_id
;

But this feels really inefficient: MIN() will require every row to be examined, even though they're all the same. I could do this instead:

SELECT contact_id, best_email, best_email_id                             
FROM (                                                                   
  SELECT contact_id, row_number() OVER (PARTITION BY contact_id) r,      
    FIRST_VALUE(email) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email,
    FIRST_VALUE(id) OVER (PARTITION BY contact_id ORDER BY is_primary DESC) best_email_id 
  FROM email                                                                              
  WHERE email != ''                                                                       
) q                                                                                       
WHERE q.r=1;

But it still feels sub-optimal.

This seems likely to be more efficient:

SET @nth=0, @c=null;                                                          
SELECT id, email FROM (
  SELECT @nth := IF(@c = contact_id, @nth + 1, 1) r, id, email, @c:=contact_id dummy
   FROM email
  WHERE email != ''                                                              
  ORDER BY contact_id, is_primary DESC                                           
) sq
WHERE sq.r = 1;

Am I missing something? Maybe this just isn't the right place for window functions?

Solução

I'm not sure I understand the problem, but I'll give it a shot. This will satisfy your sample data and expected result:

select contact_id, email_id as email_id, email 
from (
    select contact_id, id as email_id, email
     , row_number() over (partition by contact_id 
                          order by is_primary desc) as rn
    from email
    where email <> ''
) as t 
where rn = 1;

Since you enumerate the result set with row_number, I'm not sure why you need first_value.

As you noted, FIRST_VALUE does not filter the result set in any way, it just extends each row with the first value.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange