SQL Help: How do count by a group, when a date is updated?

Question 1

Your query looks real close. I'm thinking that all that's needed is to add a GROUP BY clause.

The COUNT(DISTINCT foo) will effectively "collapse" identical values, so that the count only gets incremented by 1 for each :group: of identical date values.

Based on the sample data, and the desired resultset, this should work:

 SELECT ch.lead_source_id
      , COUNT(DISTINCT ch.repurchased_date)
   FROM customers_history ch
  WHERE ch.repurchased_date >= '2014-04-01'
    AND ch.repurchased_date  < '2014-04-01' + INTERVAL 1 MONTH
    AND ch.lead_source_id IS NOT NULL
  GROUP
     BY ch.lead_source_id

In the example data, the customer_id and the lead_source_id correlate with each other. (Could be due to a small sample size...)

(See NOTES below for additional comments regarding indexes, index range scans, and GROUP BY optimization using a covering index.)

ANSWER BELOW PRIOR TO QUESTION UPDATE

This is one way to return the specified result, except for the ordering, I wasn't able to discern a pattern...

SELECT ch.lead_source_id
     , COUNT(1) AS count_
  FROM customers_history ch
 WHERE ch.cust_updated_at >= '2014-04-01' 
   AND ch.cust_updated_at <  '2014-04-01' + INTERVAL 1 MONTH
   AND ch.lead_source_id IS NOT NULL
 GROUP BY ch.lead_source_id
 ORDER BY ?

UPDATE

If you want the "count" to also be by cust_updated_at, include that column in the GROUP BY. For example, if for this sample data:

+--------+-------------+----------------+---------------------+
| id     | customer_id | lead_source_id |   cust_updated_at   |
+--------+-------------+----------------+---------------------+
| 422924 |      420450 |              4 | 2014-04-14 09:16:48 |
| 422956 |      420450 |              4 | 2014-04-14 09:16:48 |
| ?????? |      420450 |              4 | 2014-04-15 22:22:22 |
+--------+-------------+----------------+---------------------+

You want to return:

+----------------+-------+
| lead_source_id | count |
+----------------+-------+
|              4 |     2 |
|              4 |     1 |
+----------------+-------+

Then, add the cust_updated_at column to the GROUP BY clause, e.g.

SELECT ch.lead_source_id
     , COUNT(1) AS count_
  FROM customers_history ch
 WHERE ch.cust_updated_at >= '2014-04-01' 
   AND ch.cust_updated_at <  '2014-04-01' + INTERVAL 1 MONTH
   AND ch.lead_source_id IS NOT NULL
 GROUP
    BY ch.lead_source_id
     , ch.cust_updated_at

NOTES:

(If we leave off the ORDER BY clause, and the GROUP BY clause implicitly applies an ORDER BY on the same set of expressions. We only need to specify an ORDER BY clause to get a different ordering.)

Also, wrapping date columns in functions in a predicate prevents MySQL from satisfying the predicate by using an index range scan. We normally like to have "bare date columns" in the predicates, and do whatever manipulation is required on the constant side. (With the date column wrapped in a function, like YEAR() forces MySQL to evaluate that function for EVERY row in the table (or, every row that isn't filtered out by another predicate.)

For optimum performance, a suitable covering index for this query would be:

... ON customer_history (lead_source_id, created_at)

MySQL can satisfy the query entirely from the index; the explain output will show "Using index". If we leave off the ORDER BY clause, MySQL will avoid a "Using filesort" operation as well.

Question 2

I'm not sure I got what you're asking. however do you mean this?

SELECT ch.lead_source_id, count(*)
FROM customers_history ch
WHERE
     Year(ch.created_at) = 2014 AND
     Month(ch.created_at) = 4 AND ch.lead_source_id IS NOT NULL
GROUP BY ch.lead_source_id;

SQL Help: How do count by a group, when a date is updated?

EDIT