Is there a way to avoid having chained/nested “replace()” items when using an ISO 8601 timezone offset value in a generated column?
-
01-01-2021 - |
문제
I am using MySQL 5.7 and have a table that has the following columns that are being used to store some data that is source from an Apache common log formatted access log; details extracted from a MySQL schema export:
`timestamp` timestamp NULL DEFAULT NULL,
`offset` varchar(5) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL
`date` date GENERATED ALWAYS AS ((`timestamp` + interval replace(replace(replace(`offset`,'-0','-'),'+0','+'),'00','') hour)) VIRTUAL
`time` time GENERATED ALWAYS AS ((`timestamp` + interval replace(replace(replace(`offset`,'-0','-'),'+0','+'),'00','') hour)) VIRTUAL
`hour` int(2) GENERATED ALWAYS AS (hour((`timestamp` + interval replace(replace(replace(`offset`,'-0','-'),'+0','+'),'00','') hour))) VIRTUAL
As you can see I am storing timestamp
(0000-00-00 00:00:00
) and offset
(+00:00
) and then using generated columns to calculate, date
(0000-00-00
), time
(00:00:00
) and hour
(0
) values. And this is working well: I get to store the timestamp
as UTC and then—by storing the offset
value—I can dynamically get other info in non-UTC form.
But I don’t think the chained/nested REPLACE
items are too hot. The goal is to be able to take an ISO 8601 timezone offset value like -0400
or +1000
as part of an interval [some value] hour
calculation.
So is there a better way to approach this? I might be able to adjust the way that the timezone offset is stored initially so it is—essentially—exactly what I need for column calculations, but that seems messy and non-intuitive so I would rather use that +/-[for digit]
format if possible.
해결책
After reading up on CONVERT_TZ
, it seems like it would be the cleaner method of handling a case like this since I can cut down three (!!!) REPLACE
statements with one like this:
CONVERT_TZ(`timestamp`,'+00:00',REPLACE(`offset`,'00',':00'));
It still seems a bit sloppy—since I am still dealing with having to convert offset values like -0400
to -04:00
—but does work and is cleaner/easier to read and understand.
That said, the above method will fail for offset values such as +0530
, +1000
and +0000
. So instead of that, using the following method—which uses the INSERT()
string function where colon (:
) insertion is based on an offset value from the right calculated by the length of the string itself—is cleaner and works for a variety of different offsets:
CONVERT_TZ(`timestamp`,'+00:00',INSERT(`offset`,LENGTH(`offset`)-1,0,':'));
And these are the results using that with the following sample timestamps and related offsets:
SELECT CONVERT_TZ('2018-05-28 02:34:58','+00:00',INSERT('+0300',LENGTH('+0300')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 07:50:12','+00:00',INSERT('+0400',LENGTH('+0400')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 09:23:34','+00:00',INSERT('+0530',LENGTH('+0530')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 12:16:56','+00:00',INSERT('+1000',LENGTH('+1000')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 16:07:17','+00:00',INSERT('-0200',LENGTH('-0200')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 20:02:05','+00:00',INSERT('-0700',LENGTH('-0700')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 23:33:03','+00:00',INSERT('-1000',LENGTH('-1000')-1,0,':'));
SELECT CONVERT_TZ('2018-05-28 23:33:03','+00:00',INSERT('-0000',LENGTH('-0000')-1,0,':'));
And here are the results:
2018-05-28 02:34:58
with an offset of+0300
becomes:2018-05-28 05:34:58
2018-05-28 07:50:12
with an offset of+0400
becomes:2018-05-28 11:50:12
2018-05-28 09:23:34
with an offset of+0530
becomes:2018-05-28 14:53:34
2018-05-28 12:16:56
with an offset of+1000
becomes:2018-05-28 22:16:56
2018-05-28 16:07:17
with an offset of-0200
becomes:2018-05-28 14:07:17
2018-05-28 20:02:05
with an offset of-0700
becomes:2018-05-28 13:02:05
2018-05-28 23:33:03
with an offset of-1000
becomes:2018-05-28 13:33:03
2018-05-28 23:33:03
with an offset of-0000
becomes:2018-05-28 23:33:03
다른 팁
I have completely revised my long (and mostly irrelevant - thanks for pointing that out!) answer.
I did however come to one valid conclusion in that previous draft of this answer: that the best approach was to make use of CONVERT_TZ
and you've made use of it!
However, your own answer (with REPLACE
):
CONVERT_TZ(`timestamp`,'+00:00',REPLACE(`offset`,'00',':00')) -- S1/Statement_1
which you have marked as correct is, in fact, incorrect! It won't work in all circumstances, some of which are very important!
My answer now is:
CONVERT_TZ -- multi-line here for legibility, single line in SQL
(
time_utc,
'+00:00',
CONCAT
(
SUBSTRING(offset, 1, 3),
':',
SUBSTRING(offset, 4, 5)
)
) AS "Vérace's d_time", -- S2/Statement_2
And it does work under all circumstances.
The OP has come up with what I think is the optimal solution to this problem which makes use of the [INSERT][1]
function (kicking myself for not having spotted it!). In my notation, this would give:
CONVERT_TZ(time_utc, '+00:00', INSERT(offset, LENGTH(offset) -1, 0,' :'));
which also gives the correct result under all cicumstances. To illustrate this, I did the following:
Created a table:
CREATE TABLE time_test
(
time_utc TIMESTAMP NOT NULL,
offset VARCHAR(5) NOT NULL
);
Added sample data:
INSERT INTO time_test
VALUES
('2018-05-28 02:34:58', '+0300'),
('2018-05-28 07:50:12', '+0400'),
('2018-05-28 09:23:34', '+0530'), -- S1 fails because of 0530
('2018-05-28 12:16:56', '+1000'), -- S1 fails because of +1000
('2018-05-28 16:07:17', '-0200'),
('2018-05-28 20:02:05', '-0700'),
('2018-05-28 23:33:03', '-1000'), -- S1 fails because of -1000
('2018-05-28 23:33:03', '-0000'); -- S1 fails because of 0000
Then ran a query incorporating the REPLACE
answer and the CONCAT(SUBSTRING(
answer (which works for every offset
). Both answers are compared in the query result.
SELECT
time_utc,
REPLACE(offset,'00',':00') AS "OP's offset", -- from S1
CONVERT_TZ(time_utc, '+00:00', REPLACE( offset, '00', ':00')) AS "OP's d_time",
CONCAT(SUBSTRING(offset, 1, 3), ':', SUBSTRING(offset, 4, 5)) AS "Vérace's offset", -- from S2
CONVERT_TZ(time_utc, '+00:00', CONCAT(SUBSTRING(offset, 1, 3), ':', SUBSTRING(offset, 4, 5))) AS "Vérace's d_time",
TIMESTAMPDIFF(HOUR, CONVERT_TZ(time_utc, '+00:00', REPLACE( offset, '00', ':00')), CONVERT_TZ(time_utc, '+00:00', CONCAT(SUBSTRING(offset, 1, 3), ':', SUBSTRING(offset, 4, 5)))) AS "OP diff Vérace"
-- `TIMESTAMPDIFF` is the comparison. Anything other than 0 is a fail!
FROM
time_test;
And the result is (see the fiddle here):
time_utc OP's offset OP's d_time Vérace's offset Vérace's d_time OP diff Vérace
2018-05-28 02:34:58 +03:00 2018-05-28 05:34:58 +03:00 2018-05-28 05:34:58 0
2018-05-28 07:50:12 +04:00 2018-05-28 11:50:12 +04:00 2018-05-28 11:50:12 0
2018-05-28 09:23:34 +0530 null +05:30 2018-05-28 14:53:34 null
2018-05-28 12:16:56 +1:000 2018-05-28 13:16:56 +10:00 2018-05-28 22:16:56 9
2018-05-28 16:07:17 -02:00 2018-05-28 14:07:17 -02:00 2018-05-28 14:07:17 0
2018-05-28 20:02:05 -07:00 2018-05-28 13:02:00 -07:00 2018-05-28 13:02:05 0
2018-05-28 23:33:03 -1:000 2018-05-28 22:33:03 -10:00 2018-05-28 13:33:03 -9
2018-05-28 23:33:03 -:00:00 null -00:00 2018-05-28 23:33:03 null
Failing records:
no. 3 - offset =
'+0530'
- no '00' string for theREPLACE
to work on, hence theREPLACE
just returns '0530 which in turn causesCONVERT_TZ
to returnNULL
(as per the MySQL documentation - reference below. See below also for countries with timezones not on the hour),no. 4 - offset =
+1000'
- the first '00' string is replaced by ':00', hence theREPLACE
offset = '+1:00' which is read byCONVERT_TZ
as '+01:00' (and not '+10:00'), hence a discrepancy of +9 hours,no. 7 - offset =
'-1000'
- same as for record no. 4 except for the sign,no. 8 - offset =
'-00:00'
- the first '00' is replaced with ':00' which gives theREPLACE
answer an offset of ':00:00' which causesCONVERT_TZ
to returnNULL
.
With respect to the '-00:00'
(same answer with + also), the following countries use UTC (also erroneously called GMT) for part of the year (from here):
Greenland, Ireland, Iceland (all year), Britain.
A number of West African countries use UTC all year round:
Burkina Faso, Cote d'Ivoire (Ivory Coast), Gambia, Ghana, Guinea, Guinea-Bissau, Liberia, Mali, Mauritania, Saint Helena, Sao Tome and Principe, Senegal, Sierra Leone, Togo.
Furthermore, there are a number of time zones that have an offset of UTC+0 (i.e. = UTC):
AZOST – Azores Summer Time, EGST – Eastern Greenland Summer Time, WET – Western European Time, WT – Western Sahara Standard Time, Z – Zulu Time Zone
One would imagine that any app worth its salt should work for Western Europe?
MySQL CONVERT_TZ
documentation (from here):
CONVERT_TZ(dt,from_tz,to_tz)
CONVERT_TZ() converts a datetime value dt from the time zone given by from_tz to the time zone given by to_tz and returns the resulting value. Time zones are specified as described in Section 5.1.12, “MySQL Server Time Zone Support”. This function returns NULL if the arguments are invalid.
Places that have non-integer time zone offsets (from here).
Some governments make local time zones decisions that deviate from the norm. Note that India is +5½ from UTC, while Myanmar (Burma) is +6½, Iran is +4½, Iraq is +3½, Nepal is +5¾ and Central Australia is +9½. Venezuela is -4 1/2, the Canadian island of Newfoundland is -3½ hours from UTC, and some smaller islands in French Polynesia are -9½, while the Pitcairn Islands are -8½.
Finally, I would strongly urge you not to use SQL keyword
identifiers like DATE
, HOUR
, TIME
and TIMESTAMP
for database and/or table and/or field names. There are two reasons why doing this is a bad idea:
It makes your SQL difficult to read and debug. Say you get a message about a problem with a
TIMESTAMP
- is that the data type or your variable?Your application will be difficult to port should you ever decide to do so! Use lower_case identifiers with_words_separated_by_underscores and keep your SQL keywords and functions in UPPER CASE and always use singular names. This makes your SQL easy to read and debug (caveat: this is my opinion - others may not concur!).