質問

This query does a group by on lead_source_id:

SELECT ch.lead_source_id,
       Count(DISTINCT ch.repurchased_date)
FROM   customers_history ch
WHERE  ch.repurchased_date >= '2014-04-01'
   AND ch.repurchased_date < '2014-05-01'
   AND ch.lead_source_id IS NOT NULL
GROUP  BY ch.lead_source_id;

And this query totals the records in the table:

SELECT Count(DISTINCT( repurchased_date ))
FROM   customers_history
       INNER JOIN (SELECT DISTINCT( customer_id ) AS xcid
                   FROM   customers_history
                   WHERE  repurchased_date >= '2014-04-01'
                      AND repurchased_date < '2014-05-01'
                      AND lead_source_id IS NOT NULL) AS Temp
               ON Temp.xcid = customer_id
WHERE  repurchased_date >= '2014-04-01'
   AND repurchased_date < '2014-05-01'
   AND lead_source_id IS NOT NULL;

On our production data, the totals from Query1 come to 7963, but the second query prints 7905. Why the difference and how can we fix our queries?

Here's our table layout:

+--------+-------------+----------------+---------------------+--------+
| id     | customer_id | lead_source_id |   repurchased_date  | Rating |
+--------+-------------+----------------+---------------------+--------+
| 422923 |      420450 |              4 | 2014-04-14 09:16:48 |   Warm |
| 422924 |      420450 |              4 | 2014-04-14 09:16:48 |   Cold |
| 422956 |      420450 |              4 | 2014-04-14 09:16:49 |    Hot |
| 422933 |      420451 |             37 | 2014-04-14 09:18:41 |    Hot |
| 422938 |      420452 |              1 | 2014-04-10 20:50:30 |    Hot |
| 422984 |      420452 |              1 | 2014-04-12 20:50:30 |   Warm |
| 422940 |      420453 |             47 | 2014-04-14 09:20:27 |    Hot |
+--------+-------------+----------------+---------------------+--------+

EDIT

To answer some of the possibilities about nulls:

select count(id) from customers_history where customer_id is null: 0

select count(id) from customers_history where lead_source_id is null: 5103

select count(id) from customers_history where repurchased_date is null: 0

役に立ちましたか?

解決

The most obvious conclusion is that some lead_source_ids share values of repurchased_date.

Another possibility is that you have NULL values for customer_id and the second filters these out.

The third possibility is that NULL values of lead_source_id are adding additional values in the first query.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top