Question

There is a timestamp column I use to indicate whether a row was created during day or night time. My code is the following, but for some reason I only get 'DAY' as outcome. Am I not formatting the values right?

select record_id, rec_date,
       case when date_part('hour', rec_date) between 20 and 07 then 'Night'
            else 'Day' end as Indicator
from records;      

The rec_date column is a timestamp where I can see values such as 2019-11-20 21:34:02.000000 - which should get a 'Night' indicator.

Was it helpful?

Solution

SELECT record_id, rec_date
     , CASE WHEN rec_date::time <  '08:00' THEN 'Night'
            WHEN rec_date::time >= '20:00' THEN 'Night'
            ELSE 'Day' END AS indicator
FROM  records;

Why?

It's a matter of rather than . You want to do the math correctly and efficiently. The format of 'Day' and 'Night' are not in question.

McNets already shed some light on BETWEEN vs. BETWEEN SYMMETRIC. But BETWEEN SYMMETRIC 20 AND 07 would still be dead ugly, slow, and incorrect, ultimately.

  1. SYMMETRIC only makes sense with parameterized bounds where you don't know which will be greater ahead of time. Not the case, 20 and 07 are constants.

  2. Applied to your case naively, you would get day and night inverted, because BETWEEN SYMMETRIC 20 AND 07 ends up being evaluated as BETWEEN 07 AND 20 (just more expensively).

  3. OK, easily fixed by switching 'Day' and 'Night'. But now, the times 20:** and 07:** would be tagged 'Day'. BETWEEN, with or without SYMMETRIC, includes upper and lower bound. That's why it is almost always the wrong tool to use with timestamps. In this particular case, date_part() happens to counter the built-in issue with including both bounds to some extent. Either way, to match your original intent, it would have to be:

    CASE WHEN date_part('hour', ts) BETWEEN '08' AND '19'  -- adjusted
         THEN 'Day' ELSE 'Night' END AS indicator
    
  4. The query with the expression date_part('hour', rec_date) BETWEEN SYMMETRIC 20 AND 07 is roughly twice as expensive as my suggestion. The larger part due to the pointless SYMMETRIC, the smaller part due to the function being more expensive than the cast.

  5. The suggested expression is much less likely to be misunderstood than the devious BETWEEN, as it makes clear which bounds are included.

Asides

I would not call a timestamp column "record_date", as date is a different basic data type than timestamp. More potential for confusion.

If your actual data type happens to be timestamptz (or, possibly, in any case), you may have to define where in the world it's supposed to be "day" or "night". See:

OTHER TIPS

According to docs BETWEEN transforms into:

a BETWEEN x AND y

is equivalent to

a >= x AND a <= y

That means that x must be lesser than y.

Also in the docs you can find there is a BETWEEN SYMMETRIC, that can help in your question.

BETWEEN SYMMETRIC is like BETWEEN except there is no requirement that the argument to the left of AND be less than or equal to the argument on the right. If it is not, those two arguments are automatically swapped, so that a nonempty range is always implied.

create table test (id int);
insert into test values (1),(2),(3),(4),(5),(6);

select id from test where id between symmetric 4 and 2;
| id |
| -: |
|  2 |
|  3 |
|  4 |

db<>fiddle here

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top