Your query looks real close. I'm thinking that all that's needed is to add a GROUP BY
clause.
The COUNT(DISTINCT foo)
will effectively "collapse" identical values, so that the count only gets incremented by 1 for each :group: of identical date values.
Based on the sample data, and the desired resultset, this should work:
SELECT ch.lead_source_id
, COUNT(DISTINCT ch.repurchased_date)
FROM customers_history ch
WHERE ch.repurchased_date >= '2014-04-01'
AND ch.repurchased_date < '2014-04-01' + INTERVAL 1 MONTH
AND ch.lead_source_id IS NOT NULL
GROUP
BY ch.lead_source_id
In the example data, the customer_id
and the lead_source_id
correlate with each other. (Could be due to a small sample size...)
(See NOTES below for additional comments regarding indexes, index range scans, and GROUP BY optimization using a covering index.)
ANSWER BELOW PRIOR TO QUESTION UPDATE
This is one way to return the specified result, except for the ordering, I wasn't able to discern a pattern...
SELECT ch.lead_source_id
, COUNT(1) AS count_
FROM customers_history ch
WHERE ch.cust_updated_at >= '2014-04-01'
AND ch.cust_updated_at < '2014-04-01' + INTERVAL 1 MONTH
AND ch.lead_source_id IS NOT NULL
GROUP BY ch.lead_source_id
ORDER BY ?
UPDATE
If you want the "count" to also be by cust_updated_at
, include that column in the GROUP BY
. For example, if for this sample data:
+--------+-------------+----------------+---------------------+
| id | customer_id | lead_source_id | cust_updated_at |
+--------+-------------+----------------+---------------------+
| 422924 | 420450 | 4 | 2014-04-14 09:16:48 |
| 422956 | 420450 | 4 | 2014-04-14 09:16:48 |
| ?????? | 420450 | 4 | 2014-04-15 22:22:22 |
+--------+-------------+----------------+---------------------+
You want to return:
+----------------+-------+
| lead_source_id | count |
+----------------+-------+
| 4 | 2 |
| 4 | 1 |
+----------------+-------+
Then, add the cust_updated_at
column to the GROUP BY
clause, e.g.
SELECT ch.lead_source_id
, COUNT(1) AS count_
FROM customers_history ch
WHERE ch.cust_updated_at >= '2014-04-01'
AND ch.cust_updated_at < '2014-04-01' + INTERVAL 1 MONTH
AND ch.lead_source_id IS NOT NULL
GROUP
BY ch.lead_source_id
, ch.cust_updated_at
NOTES:
(If we leave off the ORDER BY
clause, and the GROUP BY
clause implicitly applies an ORDER BY
on the same set of expressions. We only need to specify an ORDER BY
clause to get a different ordering.)
Also, wrapping date columns in functions in a predicate prevents MySQL from satisfying the predicate by using an index range scan. We normally like to have "bare date columns" in the predicates, and do whatever manipulation is required on the constant side. (With the date column wrapped in a function, like YEAR()
forces MySQL to evaluate that function for EVERY row in the table (or, every row that isn't filtered out by another predicate.)
For optimum performance, a suitable covering index for this query would be:
... ON customer_history (lead_source_id, created_at)
MySQL can satisfy the query entirely from the index; the explain output will show "Using index". If we leave off the ORDER BY clause, MySQL will avoid a "Using filesort" operation as well.