Question

I have a table (let's call it Data) with a set of object IDs, numeric values and dates. I would like to identify the objects whose values had a positive trend over the last X minutes (say, an hour).

Example data:

entity_id | value | date

1234      | 15    | 2014-01-02 11:30:00

5689      | 21    | 2014-01-02 11:31:00

1234      | 16    | 2014-01-02 11:31:00

I tried looking at similar questions, but didnt find anything that helps unfortunately...

Was it helpful?

Solution

You inspired me to go and implement linear regression in SQL Server. This could be modified for MySQL/Oracle/Whatever without too much trouble. It's the mathematically best way of determining the trend over the hour for each entity_id and it will select out only the ones with a positive trend.

It implements the formula for calculating B1hat listed here: https://en.wikipedia.org/wiki/Regression_analysis#Linear_regression

create table #temp
(
    entity_id int,
    value int,
    [date] datetime
)

insert into #temp (entity_id, value, [date])
values
(1,10,'20140102 07:00:00 AM'),
(1,20,'20140102 07:15:00 AM'),
(1,30,'20140102 07:30:00 AM'),
(2,50,'20140102 07:00:00 AM'),
(2,20,'20140102 07:47:00 AM'),
(3,40,'20140102 07:00:00 AM'),
(3,40,'20140102 07:52:00 AM')

select entity_id, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
    select entity_id,
        avg(value) over(partition by entity_id) as ybar,
        value as y,
        avg(datediff(second,'20140102 07:00:00 AM',[date])) over(partition by entity_id) as xbar,
        datediff(second,'20140102 07:00:00 AM',[date]) as x
    from #temp
    where [date]>='20140102 07:00:00 AM' and [date]<'20140102 08:00:00 AM'
) as Calcs
group by entity_id
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0

OTHER TIPS

If someone needs this in Mysql, this is the code that works for me.

datapoint | plays | status_time 

1234      | 15    | 2014-01-02 11:30:00

5689      | 21    | 2014-01-02 11:31:00

1234      | 16    | 2014-01-02 11:31:00

select datapoint, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
     select datapoint,
        avg(plays) over(partition by datapoint) as ybar,
        plays as y,
        avg(TIME_TO_SEC(TIMEDIFF('2021-03-22 21:00:00', status_time))) over(partition by datapoint) as xbar,
        TIME_TO_SEC(TIMEDIFF('2021-03-22 21:00:00', status_time)) as x
    from aggregate_datapoints
    where status_time BETWEEN'2021-03-22 21:00:00' and '2021-03-22 22:00:00'
and type = 'topContent') as calcs
group by datapoint
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top