I am trying to find continuous ranges of numeric values from a dataset in MySQL. However, "gaps" in the range smaller than 5 should be ignored. Below is my current code (which works up to some point), split is smaller parts for convenience.
dataset
contains a "thetime" and "number" column (both numeric). The final goal is to get all the ranges of "thetime" associated with number > 200.
(1) First I select the "gaps" in my dataset, by selecting every "thetime" that has number <= 200.
drop temporary table if exists tmp_gaps;
create temporary table tmp_gaps as
(select thetime
from `dataset`
where number <= 200);
(2) I'm partitioning these found gaps in ranges, according to the method explained here.
drop temporary table if exists tmp_gaps_withdelta;
create temporary table tmp_gaps_withdelta as
(select min(thetime) as start, max(thetime) as theend, max(thetime) - min(thetime) + 1 as delta
from (select thetime, @curRow := @curRow + 1 as row_number
from tmp_gaps v
join (select @curRow := 0) w) v
group by thetime - row_number);
(3) Now, I'm trying filter the gaps <= 5 by joining the orginal dataset
table with tmp_gaps_withdelta
. If delta <= 5 or delta is null (meaning there is no entry in tmp_gaps_withdelta
corresponding with the original "thetime" in dataset
), I consider "thetime" part of a range, and it gets accepted in db_tmp_ranges
.
drop temporary table if exists db_tmp_ranges;
create temporary table db_tmp_ranges as
(select
case
when gaps.delta is null
or gaps.delta <= 5 then edm.thetime
else null
end as thetime
from `dataset` edm
left join tmp_gaps_withdelta gaps on edm.thetime >= gaps.start
and edm.thetime < gaps.start + gaps.delta);
Up to this point, everything works as expected. I now have a large set of "thetime" values where "number" from the original table is > 200. The data can be divided into ranges, without gaps <= 5. When I select some data from db_tmp_ranges
, I'm getting what I'm expecting.
(4) The plan now is to partition, the same way as in (2).
select *
from
(select min(thetime) as start, max(thetime) as theend, max(thetime) - min(thetime) + 1 as delta
from (select thetime, @curRow := @curRow + 1 as row_number
from db_tmp_ranges p
join (select @curRow := 0) r
where thetime is not null) p
group by thetime - row_number) q
However, the results of this query is absolutely wrong. I honestly don't know where the fault lies, since this way of partitioning in intervals has always worked for me, up till now. Any help is greatly appreciated.
EDIT: a specific example of how the query reacts:
db_tmp_ranges:
...
1393001313
1393001315
1393001316
...
1393001596
1393001597
1393001598
...
Result from last query:
...
1393001316 1393001319 4
1393001320 1393001591 272
1393001592 1393001595 4
1393001596 1393001881 286
...
As you can see, these numbers should be in 1 interval, instead of 4+. After using SQL fiddle, it appears the query itself isn't a problem.
I really don't get it. When executing...
select *
from db_tmp_ranges
where thetime >= 1393001313
and thetime <= 1393001350
order by thetime;
... I get a normal-looking list of numeric "thetime" values. But somehow the last query doesn't use db_tmp_ranges as it should.