Find the min and max date for a record across multiple tables
Question
I have 3 tables:
file_list => list of all files on web server
stats_web => number of pageviews from public web
stats_reg => number of pageviews from "registered users"
I have the schema and some sample data on http://sqlfiddle.com/#!2/98c6a/1/0
I am trying to calculate the date that a file was first hit either by the public or by a registered user.
Current SQL looks like this:
SELECT
list.path,
list.mod_date,
IF(MIN(ifnull(web.hit_date, 0000-00-00)) < MIN(ifnull(reg.hit_date, 0000-00-00)), MIN(ifnull(web.hit_date, 0000-00-00)), MIN(ifnull(reg.hit_date, 0000-00-00))) AS 'min_date',
IF(MAX(ifnull(web.hit_date, 0000-00-00)) > MAX(ifnull(reg.hit_date, 0000-00-00)), MAX(ifnull(web.hit_date, 0000-00-00)), MAX(ifnull(reg.hit_date, 0000-00-00))) AS 'max_date',
SUM(ifnull(web.pages, 0)) + SUM(ifnull(reg.pages, 0)) AS 'page_views'
FROM
file_list list
LEFT JOIN
stats_web web ON list.path = web.path
LEFT JOIN
stats_reg reg ON list.path = reg.path
WHERE
list.path LIKE '/web/htdocs/%'
GROUP BY list.path;
The problem is that if a record appears in only one of the stats tables the min date is always 0. This is because of the ifnull() on the MIN and MAX but if I don't use ifnull() then both min and max dates return NULL.
Solution
You should use a date in the distant future for the second argument of ifnull
when it is used to compare the minimum. Otherwise, the first null
would make 0000-00-00
the minimum, and since it's the earliest possible date, no other date would be able to beat it to become the minimum.
If you change the forth line of your query to
IF(MIN(ifnull(web.hit_date, 9999-12-31)) < MIN(ifnull(reg.hit_date, 9999-12-31)), MIN(ifnull(web.hit_date, 9999-12-31)), MIN(ifnull(reg.hit_date, 9999-12-31))) AS 'min_date'
you should get correct results.