Is there a way to avoid having chained/nested “replace()” items when using an ISO 8601 timezone offset value in a generated column?

https://dba.stackexchange.com/questions/207891

01-01-2021
|

문제

I am using MySQL 5.7 and have a table that has the following columns that are being used to store some data that is source from an Apache common log formatted access log; details extracted from a MySQL schema export:

`timestamp` timestamp NULL DEFAULT NULL,
`offset` varchar(5) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL
`date` date GENERATED ALWAYS AS ((`timestamp` + interval replace(replace(replace(`offset`,'-0','-'),'+0','+'),'00','') hour)) VIRTUAL
`time` time GENERATED ALWAYS AS ((`timestamp` + interval replace(replace(replace(`offset`,'-0','-'),'+0','+'),'00','') hour)) VIRTUAL
`hour` int(2) GENERATED ALWAYS AS (hour((`timestamp` + interval replace(replace(replace(`offset`,'-0','-'),'+0','+'),'00','') hour))) VIRTUAL

As you can see I am storing timestamp (0000-00-00 00:00:00) and offset (+00:00) and then using generated columns to calculate, date (0000-00-00), time (00:00:00) and hour (0) values. And this is working well: I get to store the timestamp as UTC and then—by storing the offset value—I can dynamically get other info in non-UTC form.

But I don’t think the chained/nested REPLACE items are too hot. The goal is to be able to take an ISO 8601 timezone offset value like -0400 or +1000 as part of an interval [some value] hour calculation.

So is there a better way to approach this? I might be able to adjust the way that the timezone offset is stored initially so it is—essentially—exactly what I need for column calculations, but that seems messy and non-intuitive so I would rather use that +/-[for digit] format if possible.

해결책

After reading up on CONVERT_TZ, it seems like it would be the cleaner method of handling a case like this since I can cut down three (!!!) REPLACE statements with one like this:

CONVERT_TZ(`timestamp`,'+00:00',REPLACE(`offset`,'00',':00'));

It still seems a bit sloppy—since I am still dealing with having to convert offset values like -0400 to -04:00—but does work and is cleaner/easier to read and understand.

That said, the above method will fail for offset values such as +0530, +1000 and +0000. So instead of that, using the following method—which uses the INSERT() string function where colon (:) insertion is based on an offset value from the right calculated by the length of the string itself—is cleaner and works for a variety of different offsets:

CONVERT_TZ(`timestamp`,'+00:00',INSERT(`offset`,LENGTH(`offset`)-1,0,':'));

And these are the results using that with the following sample timestamps and related offsets:

SELECT CONVERT_TZ('2018-05-28 02:34:58','+00:00',INSERT('+0300',LENGTH('+0300')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 07:50:12','+00:00',INSERT('+0400',LENGTH('+0400')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 09:23:34','+00:00',INSERT('+0530',LENGTH('+0530')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 12:16:56','+00:00',INSERT('+1000',LENGTH('+1000')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 16:07:17','+00:00',INSERT('-0200',LENGTH('-0200')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 20:02:05','+00:00',INSERT('-0700',LENGTH('-0700')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 23:33:03','+00:00',INSERT('-1000',LENGTH('-1000')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 23:33:03','+00:00',INSERT('-0000',LENGTH('-0000')-1,0,':'));

And here are the results:

2018-05-28 02:34:58 with an offset of +0300 becomes: 2018-05-28 05:34:58
2018-05-28 07:50:12 with an offset of +0400 becomes: 2018-05-28 11:50:12
2018-05-28 09:23:34 with an offset of +0530 becomes: 2018-05-28 14:53:34
2018-05-28 12:16:56 with an offset of +1000 becomes: 2018-05-28 22:16:56
2018-05-28 16:07:17 with an offset of -0200 becomes: 2018-05-28 14:07:17
2018-05-28 20:02:05 with an offset of -0700 becomes: 2018-05-28 13:02:05
2018-05-28 23:33:03 with an offset of -1000 becomes: 2018-05-28 13:33:03
2018-05-28 23:33:03 with an offset of -0000 becomes: 2018-05-28 23:33:03

다른 팁

I have completely revised my long (and mostly irrelevant - thanks for pointing that out!) answer.

I did however come to one valid conclusion in that previous draft of this answer: that the best approach was to make use of CONVERT_TZ and you've made use of it!

However, your own answer (with REPLACE):

CONVERT_TZ(`timestamp`,'+00:00',REPLACE(`offset`,'00',':00')) -- S1/Statement_1

which you have marked as correct is, in fact, incorrect! It won't work in all circumstances, some of which are very important!

My answer now is:

CONVERT_TZ  -- multi-line here for legibility, single line in SQL
(
  time_utc, 
  '+00:00', 
  CONCAT
  (
    SUBSTRING(offset, 1, 3), 
    ':', 
    SUBSTRING(offset, 4, 5)
  )
) AS "Vérace's d_time", -- S2/Statement_2

And it does work under all circumstances.

The OP has come up with what I think is the optimal solution to this problem which makes use of the [INSERT][1] function (kicking myself for not having spotted it!). In my notation, this would give:

CONVERT_TZ(time_utc, '+00:00', INSERT(offset, LENGTH(offset) -1, 0,' :'));

which also gives the correct result under all cicumstances. To illustrate this, I did the following:

Created a table:

CREATE TABLE time_test
(
  time_utc    TIMESTAMP      NOT NULL,
  offset VARCHAR(5) NOT NULL
);

Added sample data:

INSERT INTO time_test
VALUES
('2018-05-28 02:34:58', '+0300'),
('2018-05-28 07:50:12', '+0400'),
('2018-05-28 09:23:34', '+0530'), -- S1 fails because of 0530
('2018-05-28 12:16:56', '+1000'), -- S1 fails because of +1000
('2018-05-28 16:07:17', '-0200'),
('2018-05-28 20:02:05', '-0700'),
('2018-05-28 23:33:03', '-1000'), -- S1 fails because of -1000
('2018-05-28 23:33:03', '-0000'); -- S1 fails because of 0000

Then ran a query incorporating the REPLACE answer and the CONCAT(SUBSTRING( answer (which works for every offset). Both answers are compared in the query result.

SELECT 
  time_utc,
  REPLACE(offset,'00',':00') AS "OP's offset", -- from S1
  CONVERT_TZ(time_utc, '+00:00', REPLACE( offset, '00', ':00')) AS "OP's d_time",
  CONCAT(SUBSTRING(offset, 1, 3), ':', SUBSTRING(offset, 4, 5)) AS "Vérace's offset",  -- from S2
  CONVERT_TZ(time_utc, '+00:00', CONCAT(SUBSTRING(offset, 1, 3), ':', SUBSTRING(offset, 4, 5))) AS "Vérace's d_time",
  TIMESTAMPDIFF(HOUR, CONVERT_TZ(time_utc, '+00:00', REPLACE( offset, '00', ':00')), CONVERT_TZ(time_utc, '+00:00', CONCAT(SUBSTRING(offset, 1, 3), ':', SUBSTRING(offset, 4, 5)))) AS "OP diff Vérace"  

  -- `TIMESTAMPDIFF` is the comparison. Anything other than 0 is a fail!
FROM
  time_test;

And the result is (see the fiddle here):

           time_utc  OP's offset          OP's d_time  Vérace's offset      Vérace's d_time  OP diff Vérace
2018-05-28 02:34:58       +03:00  2018-05-28 05:34:58           +03:00  2018-05-28 05:34:58               0
2018-05-28 07:50:12       +04:00  2018-05-28 11:50:12           +04:00  2018-05-28 11:50:12               0
2018-05-28 09:23:34        +0530                 null           +05:30  2018-05-28 14:53:34            null
2018-05-28 12:16:56       +1:000  2018-05-28 13:16:56           +10:00  2018-05-28 22:16:56               9
2018-05-28 16:07:17       -02:00  2018-05-28 14:07:17           -02:00  2018-05-28 14:07:17               0
2018-05-28 20:02:05       -07:00  2018-05-28 13:02:00           -07:00  2018-05-28 13:02:05               0
2018-05-28 23:33:03       -1:000  2018-05-28 22:33:03           -10:00  2018-05-28 13:33:03              -9
2018-05-28 23:33:03      -:00:00                 null           -00:00  2018-05-28 23:33:03            null

Failing records:

no. 3 - offset = '+0530' - no '00' string for the REPLACE to work on, hence the REPLACE just returns '0530 which in turn causes CONVERT_TZ to return NULL (as per the MySQL documentation - reference below. See below also for countries with timezones not on the hour),
no. 4 - offset = +1000' - the first '00' string is replaced by ':00', hence the REPLACE offset = '+1:00' which is read by CONVERT_TZ as '+01:00' (and not '+10:00'), hence a discrepancy of +9 hours,
no. 7 - offset = '-1000' - same as for record no. 4 except for the sign,
no. 8 - offset = '-00:00' - the first '00' is replaced with ':00' which gives the REPLACE answer an offset of ':00:00' which causes CONVERT_TZ to return NULL.

With respect to the '-00:00' (same answer with + also), the following countries use UTC (also erroneously called GMT) for part of the year (from here):

Greenland, Ireland, Iceland (all year), Britain.

A number of West African countries use UTC all year round:

Burkina Faso, Cote d'Ivoire (Ivory Coast), Gambia, Ghana, Guinea, Guinea-Bissau, Liberia, Mali, Mauritania, Saint Helena, Sao Tome and Principe, Senegal, Sierra Leone, Togo.

Furthermore, there are a number of time zones that have an offset of UTC+0 (i.e. = UTC):

AZOST – Azores Summer Time, EGST – Eastern Greenland Summer Time, WET – Western European Time, WT – Western Sahara Standard Time, Z – Zulu Time Zone

One would imagine that any app worth its salt should work for Western Europe?

MySQL CONVERT_TZ documentation (from here):

CONVERT_TZ(dt,from_tz,to_tz)

CONVERT_TZ() converts a datetime value dt from the time zone given by from_tz to the time zone given by to_tz and returns the resulting value. Time zones are specified as described in Section 5.1.12, “MySQL Server Time Zone Support”. This function returns NULL if the arguments are invalid.

Places that have non-integer time zone offsets (from here).

Some governments make local time zones decisions that deviate from the norm. Note that India is +5½ from UTC, while Myanmar (Burma) is +6½, Iran is +4½, Iraq is +3½, Nepal is +5¾ and Central Australia is +9½. Venezuela is -4 1/2, the Canadian island of Newfoundland is -3½ hours from UTC, and some smaller islands in French Polynesia are -9½, while the Pitcairn Islands are -8½.

Finally, I would strongly urge you not to use SQL keyword identifiers like DATE, HOUR, TIME and TIMESTAMP for database and/or table and/or field names. There are two reasons why doing this is a bad idea:

It makes your SQL difficult to read and debug. Say you get a message about a problem with a TIMESTAMP - is that the data type or your variable?
Your application will be difficult to port should you ever decide to do so! Use lower_case identifiers with_words_separated_by_underscores and keep your SQL keywords and functions in UPPER CASE and always use singular names. This makes your SQL easy to read and debug (caveat: this is my opinion - others may not concur!).

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange