Question

I need to count the number of entries in a zipped (.gz) file from a S3 bucket containing certain characters. How could I do it?

Specifically, my S3 bucket is s3://mys3.com/. Under that, there are thousands of buckets like the following:

s3://mys3.com/bucket1/
s3://mys3.com/bucket2/
s3://mys3.com/bucket3/
           ...
s3://mys3.com/bucket2000/

In each of the bucket, there are about hundreds of zipped(.gz) JSON objects like the following:

s3://mys3.com/bucket1/file1.gz
s3://mys3.com/bucket1/file2.gz
s3://mys3.com/bucket1/file3.gz
           ...
s3://mys3.com/bucket1/file100.gz

Each of the zipped file contains about 20,000 JSON objects (Each JSON object is a line). In each of the JSON object, there are certain fields containing the word "request". I want to count how many JSON objects are there in bucket1 containing the word "request". I tried this but it did not work:

zcat s3cmd --recursive ls s3://mys3.com/bucket1/ | grep "request" | wc -l

I do not have a lot of shell experiences, so could anyone help me with that? Thanks!

Was it helpful?

Solution

In case anyone is interested:

s3cmd ls --recursive s3://mys3.com/bucket1/ | awk '{print $4}' | grep '.gz' | xargs -I@ s3cmd get @ - | zgrep 'request' | wc -l
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top