grep with wildcard and count the matched lines

https://stackoverflow.com/questions/23453900

15-07-2023
|

Question

I am going to COUNT the production API usage from backend access log, the access log looks like -

"GET /service1/api1?querystr1=11.."
"GET /service1/api2?querystr2=22.."
"GET /service1/api2?querystrx=xx.."
"GET /service1/api3?querystry=zz.."
"GET /service1/api3?querystr1=33.."
"GET /service1/api3?querystr3=55.."

So the expect result of the searching is:

/service1/api1  - 1
/service1/api2  - 2
/service1/api3  - 3

I know using the following command can get all service1 URL, but I do not know how to count API for matched string.

egrep '"GET /service1/.*' myaccesslogs

Your help will be appreciated, thanks.

Let me add one subsquent question:

Log:

/service1/first.do?action=doTask&type=taskA&xx=yy&zz=dd&

condition:

grep -o /service1/first.do?action=doTask&.*&

expect: /service1/first.do?action=doTask&type=taskA

actual: /service1/first.do?action=doTask&type=taskA&xx=yy&zz=dd&

I tried:

egrep -o /service1/first.do?action=doTask&.*?&

But nothing matched, looks like non-greedy (lazy) does not work for me. So then what is the correct condition?

Answer: Adding -P works perfectly

grep -Po /service1/first.do?action=doTask&.*?&

Solution

The -o param to grep makes sure that only the match is printed. Then we sort it so all like apis are consecutive because uniq will treat them separately if they're not. uniq -c prints the count and entry for consecutive unique entries.

cat my.log | grep -o "GET /service1/api." | sort | uniq -c

Output

1 GET /service1/api1
2 GET /service1/api2
3 GET /service1/api3

OTHER TIPS

Try the below command,

$ sed 's/"\(.*\)?.*/\1/g' file | awk '{count[$2]++} END{ for (ct in count) { print ct," - ",count[ct]}}' 
/service1/api1  -  1
/service1/api2  -  2
/service1/api3  -  3

try to use wc command like this:

egrep '"GET /service1/.*' myaccesslogs|wc -l

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow