Get variance and standard deviation of two numbers in two different rows/columns with sqlite / PHP

StackOverflow https://stackoverflow.com/questions/14428095

  •  16-01-2022
  •  | 
  •  

Question

I have a SQLite Database with the following structure:

rowid       ID                  startTimestamp   endTimestamp   subject
1           00:50:c2:63:10:1a   1000             1090           entrance
2           00:50:c2:63:10:1a   1100             1270           entrance
3           00:50:c2:63:10:1a   1300             1310           door1
4           00:50:c2:63:10:1a   1370             1400           entrance
.
.
.

I have prepared a sqlfiddle here: http://sqlfiddle.com/#!2/fe8c6/2

With this SQL-Query i can get the average differences between the endTime and the startTime between one row and the following row, sorted by subject and ID:

SELECT
    id,
    ( MAX(endtimestamp) - MIN(startTimestamp)
    - SUM(endtimestamp-startTimestamp)
    ) / (COUNT(*)-1) AS averageDifference
FROM
    table1
WHERE ID = '00:50:c2:63:10:1a'
AND subject = 'entrance'
GROUP BY id;

My problem: To calcute the average value is no problem, that does this query. But how can i get the standard deviation and the variance of this values?

Était-ce utile?

La solution

First finding the time differences of interest by joining the table to itself and grouping by ID, then finding the averages, variances as V(x) = E(x^2) - (E(x))^2 and standard deviation as sqrt(V)gives

SELECT ID, AVG(diff) AS average,
       AVG(diff*diff) - AVG(diff)*AVG(diff) AS variance,
       SQRT(AVG(diff*diff) - AVG(diff)*AVG(diff)) AS stdev
FROM
    (SELECT t1.id, t1.endTimestamp,
            min(t2.startTimeStamp) - t1.endTimestamp AS diff
    FROM table1 t1
    INNER JOIN table1 t2
    ON t2.ID = t1.ID AND t2.subject = t1.subject
    AND t2.startTimestamp > t1.startTimestamp  -- consider only later startTimestamps
    WHERE t1.subject = 'entrance'
    GROUP BY t1.id, t1.endTimestamp) AS diffs
GROUP BY ID

Autres conseils

For formulas that are more complex than simple summation, you have to compute the actual difference values for each record by lookin up the corresponding next start times, like this:

SELECT (SELECT MIN(startTimestamp)
        FROM table1 AS next
        WHERE next.startTimestamp > table1.startTimestamp
          AND ID = '...'
       ) - endTimestamp AS timeDifference
FROM table1
WHERE nextStartTimestamp IS NOT NULL
  AND ID = '...'

Then you can use all the difference values to do the calculations:

SELECT SUM(timeDifference) / COUNT(*) AS average,
       AVG(timeDifference)            AS moreEfficientAverage,
       SUM(timeDifference * timeDifference) / COUNT(*) -
       AVG(timeDifference) * AVG(timeDifference) AS variance
FROM (SELECT (SELECT MIN(startTimestamp)
              FROM table1 AS next
              WHERE next.startTimestamp > table1.startTimestamp
                AND next.ID = '...'
             ) - endTimestamp AS timeDifference
      FROM table1
      WHERE nextStartTimestamp IS NOT NULL
        AND ID = '...')

A number of points:

  1. Your formula for the mean is wrong the correct formula is SUM(endtimestamp-starttimestamp)/COUNT(endtimestamp). I have no idea why you have the MIN/MAX terms. COUNT(*) will count NULL rows and will give the wrong result.
  2. SQLlite has an avg function which finds the mean.
  3. The formula for the variance is SUM((endtimestamp-starttimestamp)*(endtimestamp-starttimestamp)) - AVG(endtimestamp-starttimestamp)*AVG(endtimestamp-starttimestamp)
  4. The standard deviation is the square root of the variance.

In response to the question authors comment, in order to compute the variance the start and end times must be paired with each other through a self join.

Becuase of the absence of a row_number function in SQL lite this is a little inelegant.

SELECT id,
       AVG(startTimestamp-endTimestamp) as mean,
       SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2 as variance,
       SQRT(SUM((startTimestamp-endTimestamp)^2) - AVG(startTimestamp-endTimestamp)^2) as stDev
FROM
    (SELECT
        t1.id,
        t1.endTimestamp,
        MIN(t2.startTimestamp) as starttimestamp
    FROM table1 t1
         INNER JOIN
         table1 t2 ON t1.endTimestamp<=t2.startTimestamp
    GROUP BY t1.id, t1.endTimestamp) t
GROUP BY id;

See SQL Fiddle

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top