Question

I would like to process the access-logs that Amazon CloudFront creates with Amazon Elastic MapReduce.

I just need some simple stats on how many times different files has been loaded from cloudfront so i thought i should just write a simple PIG-script for this.

The first problem i have is that cloudfront write the logs gzipped and as far as i know i can't read .gz in pig?

Any suggestions on how i should do this? I'm very new to elastic mapreduce so any hints on how to structure this kind of job is welcomed.

Was it helpful?

Solution

Sorry, this works by default. No need to unzip the logs before processing them. My bad.

OTHER TIPS

You might be interested in Qloudstat. We will offer statistics of CloudFront access logs as a service.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top