Question

I changed the lifecycle for a bunch of my buckets on Amazon S3 so their storage class was set to Glacier. I did this using the online AWS Console. I now need those files again.

I know how to restore them back to S3 per file. But my buckets have thousands of files. I wanted to see if there was a way to restore the entire bucket back to S3, just like there was a way to send the entire bucket to Glacier?

I'm guessing there's a way to program a solution. But I wanted to see if there was a way to do it in the Console. Or with another program? Or something else I might be missing?

Was it helpful?

Solution

There isn't a built-in tool for this. "Folders" in S3 are an illusion for human convenience, based on forward-slashes in the object key (path/filename) and every object that migrates to glacier has to be restored individually, although...

Of course you could write a script to iterate through the hierarchy and send those restore requests using the SDKs or the REST API in your programming language of choice.

Be sure you understand how restoring from glacier into S3 works, before you proceed. It is always only a temporary restoration, and you choose the number of days that each object will persist in S3 before reverting back to being only stored in glacier.

Also, you want to be certain that you understand the penalty charges for restoring too much glacier data in a short period of time, or you could be in for some unexpected expense. Depending on the urgency, you may want to spread the restore operation out over days or weeks.

OTHER TIPS

If you use s3cmd you can use it to restore recursively pretty easily:

s3cmd restore --recursive s3://mybucketname/ 

I've also used it to restore just folders as well:

s3cmd restore --recursive s3://mybucketname/folder/

If you're using the AWS CLI tool (it's nice, you should), you can do it like this:

aws s3 ls s3://<bucket_name> | awk '{print $4}' | xargs -L 1 aws s3api restore-object --restore-request Days=<days> --bucket <bucket_name> --key

Replace <bucket_name> with the bucket name you want.

Replace <days> with the number of days you want to restore the object for.

The above answers didn't work well for me because my bucket was mixed with objects on Glacier and some that were not. The easiest thing for me was to create a list of all GLACIER objects in the bucket, then attempt to restore each one individually, ignoring any errors (like already in progress, not an object, etc).

  1. Get a listing of all GLACIER files (keys) in the bucket

    aws s3api list-objects-v2 --bucket <bucketName> --query "Contents[?StorageClass=='GLACIER']" --output text | awk '{print $2}' > glacier-restore.txt

  2. Create a shell script and run it, replacing your "bucketName".

    #!/bin/sh
    
    for x in `cat glacier-restore.txt`
      do
        echo "Begin restoring $x"
        aws s3api restore-object --restore-request Days=7 --bucket <bucketName> --key "$x"
        echo "Done restoring $x"
      done
    

Credit goes to Josh at http://capnjosh.com/blog/a-client-error-invalidobjectstate-occurred-when-calling-the-copyobject-operation-operation-is-not-valid-for-the-source-objects-storage-class/, a resource I found after trying some of the above solutions.

I recently needed to restore a whole bucket and all its files and folders. You will need s3cmd and aws cli tools configured with your credentials to run this.

I've found this pretty robust to handle errors with specific objects in the bucket that might have already had a restore request.

#!/bin/sh

# This will give you a nice list of all objects in the bucket with the bucket name stripped out
s3cmd ls -r s3://<your-bucket-name> | awk '{print $4}' | sed 's#s3://<your-bucket-name>/##' > glacier-restore.txt

for x in `cat glacier-restore.txt`
do
    echo "restoring $x"
    aws s3api restore-object --restore-request Days=7 --bucket <your-bucket-name> --profile <your-aws-credentials-profile> --key "$x"
done

Here is my version of the aws cli interface and how to restore data from glacier. I modified some of the above examples to work when the key of the files to be restored contain spaces.

# Parameters
BUCKET="my-bucket" # the bucket you want to restore, no s3:// no slashes
BPATH="path/in/bucket/" # the objects prefix you wish to restore (mind the `/`) 
DAYS=1 # For how many days you wish to restore the data.

# Restore the objects
aws s3 ls s3://{BUCKET}/${BPATH} --recursive | \
awk '{out=""; for(i=4;i<=NF;i++){out=out" "$i}; print out}'| \
xargs -I {} aws s3api restore-object --restore-request Days={DAYS} \
--bucket {BUCKET} --key "{}"

It looks like S3 Browser can "restore from Glacier" at the folder level, but not bucket level. The only thing is you have to buy the Pro version. So not the best solution.

A variation on Dustin's answer to use AWS CLI, but to use recursion and pipe to sh to skip errors (like if some objects have already requested restore...)

BUCKET=my-bucket
BPATH=/path/in/bucket
DAYS=1
aws s3 ls s3://$BUCKET$BPATH --recursive | awk '{print $4}' | xargs -L 1 \
 echo aws s3api restore-object --restore-request Days=$DAYS \
 --bucket $BUCKET --key | sh

The xargs echo bit generates a list of "aws s3api restore-object" commands and by piping that to sh, you can continue on error.

NOTE: Ubuntu 14.04 aws-cli package is old. In order to use --recursive you'll need to install via github.

POSTSCRIPT: Glacier restores can get unexpectedly pricey really quickly. Depending on your use case, you may find the Infrequent Access tier to be more appropriate. AWS have a nice explanation of the different tiers.

Another way is rclone. This tool can sync / copy / Push data ( like we could do with files ). https://rclone.org/faq/#can-rclone-sync-directly-from-drive-to-s3 (the link example is for google drive, but this is agnostique ). But as Michael - sqlbot said, a server or a container has to start the sync/backup operation somewhere.

This command worked for me:

aws s3api list-objects-v2 \
--bucket BUCKET_NAME \
--query "Contents[?StorageClass=='GLACIER']" \
--output text | \
awk -F $'\t' '{print $2}' | \
tr '\n' '\0' | \
xargs -L 1 -0 \
aws s3api restore-object \
--restore-request Days=7 \
--bucket BUCKET_NAME \
--key

ProTip

  • This command can take quite while if you have lots of objects.
  • Don't CTRL-C / break the command otherwise you'll have to wait for the processed objects to move out of the RestoreAlreadyInProgress state before you can re-run it. It can take a few hours for the state to transition. You'll see this error message if you need to wait: An error occurred (RestoreAlreadyInProgress) when calling the RestoreObject operation
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top