Question

I want to calculate total and unique clickouts based on country,partner and retailer. I have achieved the desired result but i think its not a optimal solution and for longer data sets it will take longer time. how can I improve this query? here is my test table, designed query and expected output:

"country_id","partner","retailer","id_customer","id_clickout"
"1","A","B","100","XX"
"1","A","B","100","XX"
"2","A","B","100","XX"
"2","A","B","100","GG"
"2","A","B","100","XX"
"2","A","B","101","XX"

DROP TABLE IF EXISTS x;
CREATE TEMPORARY TABLE x AS
SELECT test1.country_id, test1.partner,test1.retailer, test1.id_customer, 
SUM(CASE WHEN test1.id_clickout IS NULL THEN 0 ELSE 1 END) AS clicks,
CASE WHEN test1.id_clickout IS NULL THEN 0 ELSE 1 END AS unique_clicks
FROM test1
GROUP BY 1,2,3,4
;
SELECT country_id,partner,retailer, SUM(clicks), SUM(unique_clicks)
FROM x
GROUP BY 1,2,3

Output:

"country_id","partner","retailer","SUM(clicks)","SUM(unique_clicks)"
"1","A","B","2","1"
"2","A","B","4","2"

And here is DDL and input data:

CREATE TABLE test (
 country_id INT(11) DEFAULT NULL,
 partner VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL,
 retailer VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL,
 id_customer BIGINT(20) DEFAULT NULL,
 id_clickout VARCHAR(256) CHARACTER SET utf8 DEFAULT NULL)
  ENGINE=InnoDB DEFAULT CHARSET=utf8;


INSERT INTO test VALUES(1,'A','B','100','XX'),(1,'A','B','100','XX'),
            (2,'A','B','100','XX'),(2,'A','B','100','GG'),
            (2,'A','B','100','XX'),(2,'A','B','101','xx')
Was it helpful?

Solution

SELECT
  country_id,
  partner,
  retailer,
  COUNT(id_clickout)   AS clicks,
  COUNT(DISTINCT CASE WHEN id_clickout IS NOT NULL THEN id_customer END) AS unique_clicks
FROM
  test1
GROUP BY
  1,2,3
;

COUNT(a_field) won't count any NULL values.

So, COUNT(id_clickout) will only count the number of times that it is NOT NULL.

Equally, the CASE WHEN statement in the unique_clicks only returns the id_customer for records where they clicked, otherwise it returns NULL. This means that the COUNT(DISTINCT CASE) only counts distinct customers, and only when they clicked.

EDIT :

I just realised, it's potentially even simpler than that...

SELECT
  country_id,
  partner,
  retailer,
  COUNT(*)                    AS clicks,
  COUNT(DISTINCT id_customer) AS unique_clicks
FROM
  test1
WHERe
  id_clickout IS NOT NULL
GROUP BY
  1,2,3
;

The only material difference in the results will be that any country_id, partner, retailed that previously showed up with 0 clicks will now not appear in the results at all.

With an INDEX on country_id, partner, retailed, id_clickout, id_customer or country_id, partner, retailed, id_customer, id_clickout, however, this query should be significantly faster.

OTHER TIPS

I think this is what you are after:

SELECT country_id,partner,retailer,COUNT(retailer) as `sum(clicks)`,count(distinct id_clickout) as `SUM(unique_clicks)`
FROM test1
GROUP BY country_id,partner,retailer

Result:

COUNTRY_ID  PARTNER  RETAILER  SUM(CLICKS)  SUM(UNIQUE_CLICKS)
1           A        B         2            1
2           A        B         4            2

See result in SQL Fiddle.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top