Domanda

I have a large MySQL table, even when properly indexed it can take 1 second for each query (doesn't sound like much but it is run for thousands of servers). Right now, I have four queries going through to get 95th percentile inbound, 95th percentile outbound, and the sum of both.

Query 1: To get number of rows to get 95th percentile row

SELECT round(count(*)*.95 FROM traffic WHERE server_id = 1;

Query 2&3 To get 95th percentile

SELECT inbound FROM traffic WHERE server_id = 1 ORDER BY inbound ASC LIMIT {95th},1
SELECT outbound FROM traffic WHERE server_id = 1 ORDER BY outbound ASC LIMIT {95th},1

Query 4 Get sum of traffic

SELECT sum(inbound+outbound) FROM traffic WHERE server_id = 1; 

Can you think of any way I could combine these? I am challenged to think of a way since I need to get the 95th percentile, which is calculated by selecting a specific row based on the count. For example, if there are 10000 rows, then you order them ascending and select the 9500th row.

È stato utile?

Soluzione 2

As noted in http://planet.mysql.com/entry/?id=13588 :

SELECT
    SUBSTRING_INDEX(
            SUBSTRING_INDEX(
                GROUP_CONCAT( 
                    t.inbound
                    ORDER BY t.inbound
                    SEPARATOR ','
                )
            ,   ','
            ,   95/100 * COUNT(*) + 1
            )
        ,   ','  
        ,   -1  
        )                 AS `Inbound95`
    ,
    SUBSTRING_INDEX(
            SUBSTRING_INDEX(
                GROUP_CONCAT(  
                    t.outbound
                    ORDER BY t.outbound
                    SEPARATOR ','
                )
            ,   ','         
            ,   95/100 * COUNT(*) + 1 
            )
        ,   ','                       
        ,   -1                          
        )                 AS `Outbound95`
FROM   traffic AS t WHERE t.server_id = 1

will give you the two percentiles

NOTE: you may need to increase group_concat_max_len

Altri suggerimenti

If you are willing to give up some precision, you can use estimate for row count rather than exact row count. If your database is using InnoDB, SELECT count(*) could be very slow. In other words:

  1. To get estimate, you could use SHOW TABLE STATUS command. It will be lightning fast, but not necessarily 100% accurate.

  2. Replace your statement:

    SELECT inbound FROM traffic WHERE server_id = 1 ORDER BY inbound ASC LIMIT {95th},1
    

    with

    SELECT inbound FROM traffic WHERE server_id = 1 ORDER BY inbound DESC LIMIT {5th},1
    

    Result should be identical, but about 20x faster. Just make sure to create compound index on (server_id, inbound).

  3. see 2.

  4. Leave this alone.

I expect that total time to get necessary numbers will be reduced to few milliseconds.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top