Configure Hadoop to use S3 requester-pays-enabled

https://stackoverflow.com/questions/23553663

18-07-2023
|

Question

I'm using Hadoop (via Spark), and need to access S3N content which is requester-pays. Normally, this is done by enabling httpclient.requester-pays-buckets-enabled = true in jets3t.properties. Yet, I've set this and Spark / Hadoop are ignoring it. Perhaps I'm putting the jets3t.properties in the wrong place (/usr/share/spark/conf/). How can I get Hadoop / Spark / JetS3t to access requestor-pays buckets?

UPDATE: This is needed if you are outside Amazon EC2. Within EC2, Amazon doesn't require requester-pays. So, a crude workaround is to run out of EC2.

Solution 2

Environment variables and config files didn't work, but some manual code did: sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "PUTTHEKEYHERE")

OTHER TIPS

The Spark system is made up of several JVMs (application, master, workers, executors), so setting properties can be tricky. You could use System.getProperty() before the file operation to check if the JVM where the code runs has loaded the right config. You could even use System.setProperty() to directly set it at that point instead of figuring out the config files.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow