Compare solutions:
Number of seconds it took on my server to get 99 percentile of 1.3 million rows:
- LIMIT x,y with index and no where:
0.01 seconds
- LIMIT x,y with no where:
0.7 seconds
- LIMIT x,y with where:
2.3 seconds
- Full scan with no where:
1.6 seconds
- Full scan with where:
5.7 seconds
Fastest solution for large tables using LIMIT x,y
():
- Get count of values:
SELECT COUNT(*) AS cnt FROM t
- Get nth value, where
n = (cnt - 1) * (1 - 0.95)
: SELECT k FROM t ORDER BY k DESC LIMIT n,1
This solution requires two queries, because mysql does not support specifying variables in LIMIT clause, except for stored procedures (can be optimized with stored procedure). Usually additional query overhead is very low
This solution can be further optimized if you add index to k column and do not use complex where clauses (like 0.01 second for table with 1 million rows, because sorting is not needed).
Implementation example in PHP (can calculate percentile not only of columns, but also of expressions):
function get_percentile($table, $where, $expr, $percentile) {
if ($where) $subq = "WHERE $where";
else $subq = "";
$r = query("SELECT COUNT(*) AS cnt FROM $table $subq");
$w = mysql_fetch_assoc($r);
$num = abs(round(($w['cnt'] - 1) * (100 - $percentile) / 100.0));
$q = "SELECT ($expr) AS prcres FROM $table $subq ORDER BY ($expr) DESC LIMIT $num,1";
$r = query($q);
if (!mysql_num_rows($r)) return null;
$w = mysql_fetch_assoc($r);
return $w['prcres'];
}
// Usage example
$time = get_percentile(
"state", // table
"service='Time' AND cnt>0 AND total>0", // some filter
"total/cnt", // expression to evaluate
80); // percentile