자가 요인, 크로스 조인 및 그룹화

https://stackoverflow.com/questions/1717766

19-09-2019
|

문제

여러 소스에서 시간이 지남에 따라 온도 샘플 테이블이 있으며 정해진 시간 간격으로 모든 소스에서 최소, 최대 및 평균 온도를 찾고 싶습니다. 언뜻보기에 이것은 그렇게 쉽게 이루어집니다.

SELECT MIN(temp), MAX(temp), AVG(temp) FROM samples GROUP BY time;

그러나 소스가 떨어지고 나가면 문제가 발생하는 동안 누락 된 소스를 무시하지 않고 사물이 훨씬 더 복잡해집니다. 샘플. 시간이 지남에 따라 샘플에 걸쳐 DateTimes를 사용하고 구간을 구조화하면 시간이 지남에 따라 고르지 않게 분산되면 더욱 복잡해집니다.

첫 번째 테이블의 시간이 두 번째 테이블의 시간보다 크거나 동일 한 다음 그룹화 된 행의 집계 값을 계산하는 샘플 테이블에서 자체 합의를 수행하여 원하는 결과를 만들 수 있다고 생각합니다. 원천. 그러나 나는 실제로 이것을하는 방법에 대해 혼란스러워합니다.

내 테스트 테이블은 다음과 같습니다.

+------+------+------+
| time   | source  | temp |
+------+------+------+
|    1 | a    |   20 | 
|    1 | b    |   18 | 
|    1 | c    |   23 | 
|    2 | b    |   21 | 
|    2 | c    |   20 | 
|    2 | a    |   18 | 
|    3 | a    |   16 | 
|    3 | c    |   13 | 
|    4 | c    |   15 | 
|    4 | a    |    4 | 
|    4 | b    |   31 | 
|    5 | b    |   10 | 
|    5 | c    |   16 | 
|    5 | a    |   22 | 
|    6 | a    |   18 | 
|    6 | b    |   17 | 
|    7 | a    |   20 | 
|    7 | b    |   19 | 
+------+------+------+
INSERT INTO samples (time, source, temp) VALUES (1, 'a', 20), (1, 'b', 18), (1, 'c', 23), (2, 'b', 21), (2, 'c', 20), (2, 'a', 18), (3, 'a', 16), (3, 'c', 13), (4, 'c', 15), (4, 'a', 4), (4, 'b', 31), (5, 'b', 10), (5, 'c', 16), (5, 'a', 22), (6, 'a', 18), (6, 'b', 17), (7, 'a', 20), (7, 'b', 19);

최소, Max 및 AVG 계산을 수행하려면 다음과 같이 보이는 중간 테이블을 원합니다.

+------+------+------+
| time   | source  | temp |
+------+------+------+
|    1 | a    |   20 | 
|    1 | b    |   18 | 
|    1 | c    |   23 | 
|    2 | b    |   21 | 
|    2 | c    |   20 | 
|    2 | a    |   18 | 
|    3 | a    |   16 | 
|    3 | b    |   21 | 
|    3 | c    |   13 | 
|    4 | c    |   15 | 
|    4 | a    |    4 | 
|    4 | b    |   31 | 
|    5 | b    |   10 | 
|    5 | c    |   16 | 
|    5 | a    |   22 | 
|    6 | a    |   18 | 
|    6 | b    |   17 | 
|    6 | c    |   16 | 
|    7 | a    |   20 | 
|    7 | b    |   19 | 
|    7 | c    |   16 | 
+------+------+------+

다음 쿼리는 내가 원하는 것에 가까워지는 것이지만 주어진 시간 간격에서 가장 최근의 결과보다는 소스의 첫 번째 결과의 온도 값을 취합니다.

SELECT s.dt as sdt, s.mac, ss.temp, MAX(ss.dt) as maxdt FROM (SELECT DISTINCT dt FROM samples) AS s CROSS JOIN samples AS ss WHERE s.dt >= ss.dt GROUP BY sdt, mac HAVING maxdt <= s.dt ORDER BY sdt ASC, maxdt ASC;

+------+------+------+-------+
| sdt  | mac  | temp | maxdt |
+------+------+------+-------+
|    1 | a    |   20 |     1 | 
|    1 | c    |   23 |     1 | 
|    1 | b    |   18 |     1 | 
|    2 | a    |   20 |     2 | 
|    2 | c    |   23 |     2 | 
|    2 | b    |   18 |     2 | 
|    3 | b    |   18 |     2 | 
|    3 | a    |   20 |     3 | 
|    3 | c    |   23 |     3 | 
|    4 | a    |   20 |     4 | 
|    4 | c    |   23 |     4 | 
|    4 | b    |   18 |     4 | 
|    5 | a    |   20 |     5 | 
|    5 | c    |   23 |     5 | 
|    5 | b    |   18 |     5 | 
|    6 | c    |   23 |     5 | 
|    6 | a    |   20 |     6 | 
|    6 | b    |   18 |     6 | 
|    7 | c    |   23 |     5 | 
|    7 | b    |   18 |     7 | 
|    7 | a    |   20 |     7 | 
+------+------+------+-------+

업데이트: Chadhoc (좋은 이름, 그건 그렇고!)은 불행히도 MySQL에서 작동하지 않는 멋진 솔루션을 제공합니다. FULL JOIN 그는 사용한다. 운 좋게도, 나는 단순하다고 믿는다 UNION 효과적인 교체품입니다.

-- Unify the original samples with the missing values that we've calculated
(
  SELECT time, source, temp
  FROM samples
)
UNION
( -- Pull all the time/source combinations that we are missing from the sample set, along with the temp
  -- from the last sampled interval for the same time/source combination if we do not have one
  SELECT  a.time, a.source, (SELECT t2.temp FROM samples AS t2 WHERE t2.time < a.time AND t2.source = a.source ORDER BY t2.time DESC LIMIT 1) AS temp
  FROM    
  ( -- All values we want to get should be a cross of time/temp
    SELECT t1.time, s1.source
    FROM
    (SELECT DISTINCT time FROM samples) AS t1
    CROSS JOIN
    (SELECT DISTINCT source FROM samples) AS s1
  ) AS a
  LEFT JOIN samples s
  ON a.time = s.time
  AND a.source = s.source
  WHERE s.source IS NULL
)
ORDER BY time, source;

Update 2: MySQL은 다음을 제공합니다 EXPLAIN Chadhoc 코드 출력 :

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
| id | select_type        | table      | type | possible_keys | key  | key_len | ref  | rows | Extra                       |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+
|  1 | PRIMARY            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 |                             | 
|  2 | UNION              | <derived4> | ALL  | NULL          | NULL | NULL    | NULL |   21 |                             | 
|  2 | UNION              | s          | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where                 | 
|  4 | DERIVED            | <derived6> | ALL  | NULL          | NULL | NULL    | NULL |    3 |                             | 
|  4 | DERIVED            | <derived5> | ALL  | NULL          | NULL | NULL    | NULL |    7 |                             | 
|  6 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary             | 
|  5 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary             | 
|  3 | DEPENDENT SUBQUERY | t2         | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where; Using filesort | 
| NULL | UNION RESULT       | <union1,2> | ALL  | NULL          | NULL | NULL    | NULL | NULL | Using filesort              | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------------------+

Charles의 코드가 그렇게 작동 할 수있었습니다.

SELECT T.time, S.source,
  COALESCE(
    D.temp,
    (
      SELECT temp FROM samples
      WHERE source = S.source AND time = (
        SELECT MAX(time)
        FROM samples
        WHERE
          source = S.source
          AND time < T.time
      )
    )
  ) AS temp
FROM (SELECT DISTINCT time FROM samples) AS T
CROSS JOIN (SELECT DISTINCT source FROM samples) AS S
  LEFT JOIN samples AS D
ON D.source = S.source AND D.time = T.time

그것의 설명은 다음과 같습니다.

+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
| id | select_type        | table      | type | possible_keys | key  | key_len | ref  | rows | Extra           |
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+
|  1 | PRIMARY            | <derived5> | ALL  | NULL          | NULL | NULL    | NULL |    3 |                 | 
|  1 | PRIMARY            | <derived4> | ALL  | NULL          | NULL | NULL    | NULL |    7 |                 | 
|  1 | PRIMARY            | D          | ALL  | NULL          | NULL | NULL    | NULL |   18 |                 | 
|  5 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary | 
|  4 | DERIVED            | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using temporary | 
|  2 | DEPENDENT SUBQUERY | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where     | 
|  3 | DEPENDENT SUBQUERY | temp       | ALL  | NULL          | NULL | NULL    | NULL |   18 | Using where     | 
+----+--------------------+------------+------+---------------+------+---------+------+------+-----------------+

해결책

MySQL에서 순위/창 함수를 사용하면 더 나은 성능을 얻을 수 있다고 생각하지만 불행히도 TSQL 구현뿐만 아니라 이러한 것도 모릅니다. 다음은 작동하는 ANSI 호환 솔루션입니다.

-- Full join across the sample set and anything missing from the sample set, pulling the missing temp first if we do not have one
select  coalesce(c1.[time], c2.[time]) as dt, coalesce(c1.source, c2.source) as source, coalesce(c2.temp, c1.temp) as temp
from    samples c1
full join ( -- Pull all the time/source combinations that we are missing from the sample set, along with the temp
            -- from the last sampled interval for the same time/source combination if we do not have one
            select  a.time, a.source,
                    (select top 1 t2.temp from samples t2 where t2.time < a.time and t2.source = a.source order by t2.time desc) as temp
            from    
                (   -- All values we want to get should be a cross of time/samples
                    select t1.[time], s1.source
                    from
                    (select distinct [time] from samples) as t1
                    cross join
                    (select distinct source from samples) as s1
                ) a
            left join samples s
            on  a.[time] = s.time
            and a.source = s.source
            where s.source is null
        ) c2
on c1.time = c2.time
and c1.source = c2.source
order by dt, source

다른 팁

나는 이것이 복잡해 보인다는 것을 알고 있지만, 스스로 설명하기 위해 형식이 지정되어 있습니다 ... 작동해야합니다 ... 세 가지 소스 만 있기를 바랍니다 ...이 경우보다 임의의 소스가 작동하지 않으면 ... 두 번째 쿼리 참조 ... 편집 : 첫 번째 시도가 제거되었습니다.

편집 : 소스를 미리 알지 못하면 실송 값을 "채우는"중간 결과 세트를 만드는 곳에서 무언가를해야합니다.

제 2 편 편집 : 로직을 움직여서 선택 조건에서 조정 조건으로 각 소스에 대한 가장 최근의 온도 판독 값을 검색함으로써 Coalesce가 제거되었습니다.

Select T.Time, Max(Temp) MaxTemp,
  Min(Temp) MinTemp, Avg(Temp) AvgTemp
From
  (Select T.TIme, S.Source, D.Temp
   From (Select Distinct Time From Samples) T
     Cross Join 
        (Select Distinct Source From Samples) S
     Left Join Samples D
        On D.Source = S.Source
           And D.Time = 
               (Select Max(Time)
                From Samples
                Where Source = S.Source
                   And Time <= T.Time)) Z
Group By T.Time

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow