Question

I have a table which contains thousands of rows and I would like to calculate the 90th percentile for one of the fields, called 'round'.

For example, select the value of round which is at the 90th percentile.

I don't see a straightforward way to do this in MySQL.

Can somebody provide some suggestions as to how I may start this sort of calculation?

Thank you!

Was it helpful?

Solution

First, lets assume that you have a table with a value column. You want to get the row with 95th percentile value. In other words, you are looking for a value that is bigger than 95 percent of all values.
Here is a simple answer:

SELECT * FROM 
(SELECT t.*,  @row_num :=@row_num + 1 AS row_num FROM YOUR_TABLE t, 
    (SELECT @row_num:=0) counter ORDER BY YOUR_VALUE_COLUMN) 
temp WHERE temp.row_num = ROUND (.95* @row_num); 

OTHER TIPS

Compare solutions:

Number of seconds it took on my server to get 99 percentile of 1.3 million rows:

  • LIMIT x,y with index and no where: 0.01 seconds
  • LIMIT x,y with no where: 0.7 seconds
  • LIMIT x,y with where: 2.3 seconds
  • Full scan with no where: 1.6 seconds
  • Full scan with where: 5.7 seconds

Fastest solution for large tables using LIMIT x,y ():

  1. Get count of values: SELECT COUNT(*) AS cnt FROM t
  2. Get nth value, where n = (cnt - 1) * (1 - 0.95) : SELECT k FROM t ORDER BY k DESC LIMIT n,1

This solution requires two queries, because mysql does not support specifying variables in LIMIT clause, except for stored procedures (can be optimized with stored procedure). Usually additional query overhead is very low

This solution can be further optimized if you add index to k column and do not use complex where clauses (like 0.01 second for table with 1 million rows, because sorting is not needed).

Implementation example in PHP (can calculate percentile not only of columns, but also of expressions):

function get_percentile($table, $where, $expr, $percentile) {
  if ($where) $subq = "WHERE $where";
  else $subq = "";

  $r = query("SELECT COUNT(*) AS cnt FROM $table $subq");
  $w = mysql_fetch_assoc($r);
  $num = abs(round(($w['cnt'] - 1) * (100 - $percentile) / 100.0));

  $q = "SELECT ($expr) AS prcres FROM $table $subq ORDER BY ($expr) DESC LIMIT $num,1";
  $r = query($q);
  if (!mysql_num_rows($r)) return null;
  $w = mysql_fetch_assoc($r);
  return $w['prcres'];
}

// Usage example
$time = get_percentile(
  "state", // table
  "service='Time' AND cnt>0 AND total>0", // some filter
  "total/cnt", // expression to evaluate
  80); // percentile

The SQL standard supports the PERCENTILE_DISC and PERCENTILE_CONT inverse distribution functions for precisely this job. Implementations are available in at least Oracle, PostgreSQL, SQL Server, Teradata. Unfortunately not in MySQL. But you can emulate PERCENTILE_DISC in MySQL 8 as follows:

SELECT DISTINCT first_value(my_column) OVER (
  ORDER BY CASE WHEN p <= 0.9 THEN p END DESC /* NULLS LAST */
) x,
FROM (
  SELECT
    my_column,
    percent_rank() OVER (ORDER BY my_column) p,
  FROM my_table
) t;

This calculates the PERCENT_RANK for each row given your my_column ordering, and then finds the last row for which the percent rank is less or equal to the 0.9 percentile.

This only works on MySQL 8+, which has window function support.

http://www.artfulsoftware.com/infotree/queries.php#68

SELECT  
  a.film_id , 
  ROUND( 100.0 * ( SELECT COUNT(*) FROM film AS b WHERE b.length <= a.length ) / total.cnt, 1 )  
  AS percentile 
FROM film a  
CROSS JOIN (  
  SELECT COUNT(*) AS cnt  
  FROM film  
) AS total 
ORDER BY percentile DESC; 

This can be slow for very large tables

I was trying to solve this for quite some time and then I found the following answer. Honestly brilliant. Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds).

SELECT 
    CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY 
    field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL) 
    AS 95th Per 
FROM table_name;

As you can imagine just replace table_name and field_name with your table's and column's names.

For further information check Roland Bouman's original post

As pert Tony_Pets answer, but as I noted on a similar question: I had to change the calculation slightly, for example the 90th percentile - "90/100 * COUNT(*) + 0.5" instead of "90/100 * COUNT(*) + 1". Sometimes it was skipping two values past the percentile point in the ordered list, instead of picking the next higher value for the percentile. Maybe the way integer rounding works in mysql.

ie:

.... SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(fieldValue ORDER BY fieldValue SEPARATOR ','), ',', 90/100 * COUNT(*) + 0.5), ',', -1) as 90thPercentile ....

In MySQL 8 there is the ntile window function you can use:

SELECT SomeTable.ID, SomeTable.Round
FROM SomeTable
JOIN (
    SELECT SomeTable, (NTILE(100) OVER w) AS Percentile
    FROM SomeTable
        WINDOW w AS (ORDER BY Round)
) AS SomeTablePercentile ON SomeTable.ID = SomeTablePercentile.ID
WHERE Percentile = 90
LIMIT 1

https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_ntile

The most common definition of a percentile is a number where a certain percentage of scores fall below that number. You might know that you scored 67 out of 90 on a test. But that figure has no real meaning unless you know what percentile you fall into. If you know that your score is in the 95th percentile, that means you scored better than 95% of people who took the test.

This solution works also with the older MySQL 5.7.

SELECT *, @row_num as numRows, 100 - (row_num * 100/(@row_num + 1)) as percentile
FROM (
    select *, @row_num := @row_num + 1 AS row_num 
    from (
      SELECT t.subject, pt.score, p.name
      FROM test t, person_test pt, person p, (
        SELECT @row_num := 0
      ) counter 
      where t.id=pt.test_id
      and p.id=pt.person_id
      ORDER BY score desc
    ) temp
) temp2
-- optional: filter on a minimal percentile (uncomment below)
-- having percentile >= 80

contents/records database design and relationships result of example percentile query

An alternative solution that works in MySQL 8: generate a histogram of your data:

ANALYZE TABLE my_table UPDATE HISTOGRAM ON my_column WITH 100 BUCKETS;

And then just select the 95th record from information_schema.column_statistics:

SELECT v,c FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', 
     '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist 
     WHERE column_name='my_column' LIMIT 95,1

And voila! You will still need to decide whether you take the lower or upper limit of the percentile, or perhaps take an average - but that is a small task now. Most importantly - this is very quick, once the histogram object is built.

Credit for this solution: lefred's blog.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top