Question

Lets say I have a log file from a web server with response times per request:

_1st_request 1334
_2nd_request 345
_3rd_request 244
_4th_request 648
......... etc

Is there an easy way with bash scripting to find the top decile (10-quantile)? In other words to answer the question: How slow was the slowest request if I exclude the slowest 10% of requests?

Was it helpful?

Solution

awk '{print $2}' | sort -rn | perl -e '$d=.1;@l=<>;print $l[int($d*$#l)]'

It would indeed be more elegant to do the whole thing in perl. If you want to use a temporary file, you can use wc + head/tail to select the quantile from the sorted list of numbers.

OTHER TIPS

I would probably sort numerically by the request field count the lines and grab the line that's 10% from the end.

FILE=responseTimes.log
TMPFILE=tmpfile
sort -k 2 -n $FILE > $TMPFILE
LINECOUNT=`wc -l $TMPFILE | sed -e 's/^ *//' -e 's/ .*$//'`
TARGETLINE=echo "$LINECOUNT * 9 / 10" | bc
sed -n "$TARGETLINE{p;q;}" $TMPFILE

Hope that's what you were looking for.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top