
I have a large MySQL table, even when properly indexed it can take 1 second for each query (doesn't sound like much but it is run for thousands of servers). Right now, I have four queries going through to get 95th percentile inbound, 95th percentile outbound, and the sum of both.

Query 1: To get number of rows to get 95th percentile row

SELECT round(count(*)*.95 FROM traffic WHERE server_id = 1;

Query 2&3 To get 95th percentile

SELECT inbound FROM traffic WHERE server_id = 1 ORDER BY inbound ASC LIMIT {95th},1
SELECT outbound FROM traffic WHERE server_id = 1 ORDER BY outbound ASC LIMIT {95th},1

Query 4 Get sum of traffic

SELECT sum(inbound+outbound) FROM traffic WHERE server_id = 1; 

Can you think of any way I could combine these? I am challenged to think of a way since I need to get the 95th percentile, which is calculated by selecting a specific row based on the count. For example, if there are 10000 rows, then you order them ascending and select the 9500th row.


解決 2

As noted in http://planet.mysql.com/entry/?id=13588 :

                    ORDER BY t.inbound
                    SEPARATOR ','
            ,   ','
            ,   95/100 * COUNT(*) + 1
        ,   ','  
        ,   -1  
        )                 AS `Inbound95`
                    ORDER BY t.outbound
                    SEPARATOR ','
            ,   ','         
            ,   95/100 * COUNT(*) + 1 
        ,   ','                       
        ,   -1                          
        )                 AS `Outbound95`
FROM   traffic AS t WHERE t.server_id = 1

will give you the two percentiles

NOTE: you may need to increase group_concat_max_len


If you are willing to give up some precision, you can use estimate for row count rather than exact row count. If your database is using InnoDB, SELECT count(*) could be very slow. In other words:

  1. To get estimate, you could use SHOW TABLE STATUS command. It will be lightning fast, but not necessarily 100% accurate.

  2. Replace your statement:

    SELECT inbound FROM traffic WHERE server_id = 1 ORDER BY inbound ASC LIMIT {95th},1


    SELECT inbound FROM traffic WHERE server_id = 1 ORDER BY inbound DESC LIMIT {5th},1

    Result should be identical, but about 20x faster. Just make sure to create compound index on (server_id, inbound).

  3. see 2.

  4. Leave this alone.

I expect that total time to get necessary numbers will be reduced to few milliseconds.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top