Question

I am periodically uploading a file to AWS Glacier using boto as follows:

# Import boto's layer2
import boto.glacier.layer2

# Create a Layer2 object to connect to Glacier
l = boto.glacier.layer2.Layer2(aws_access_key_id=awsAccess, aws_secret_access_key=awsSecret)

# Get a vault based on vault name (assuming you created it already)
v = l.get_vault(vaultName)

# Create an archive from a local file on the vault
archiveID = v.create_archive_from_file(fileName)

However this fails for files that are larger than 4 GB in size.

I'm assuming that this is because as specified in the Amazon Glacier FAQ: "The largest archive that can be uploaded in a single Upload request is 4 gigabytes. For items larger than 100 megabytes, customers should consider using the Multipart upload capability."

How do I use the Multipart upload capability with boto and AWS Glacier?

Was it helpful?

Solution 2

Amazon Glacier uses the term archive to describe files. In other words, you cannot upload a file larger than 4GB to Glacier. If you'd like to try the multipart uploader anyway, look at vault.concurrent_create_archive_from_file or vault.create_archive_writer

OTHER TIPS

I've just looked into sources and is seems that boto.glacier.vault.Vault.upload_archive() does all the magic automatically:

Adds an archive to a vault. For archives greater than 100MB the multipart upload will be used.

def upload_archive(self, filename, description=None):
    if os.path.getsize(filename) > self.SingleOperationThreshold:
        return self.create_archive_from_file(filename, description=description)
    return self._upload_archive_single_operation(filename, description)

The glacier docs clearly state:

Depending on the size of the data you are uploading, Amazon Glacier offers the following options:

  • Upload archives in a single operation—In a single operation, you can upload archives from 1 byte to up to 4 GB in size. However, we encourage Amazon Glacier customers to use Multipart Upload to upload archives greater than 100 MB.

  • Upload archives in parts—Using the Multipart upload API you can upload large archives, up to about 40,000 GB (10,000 * 4 GB).

    The Multipart Upload API call is designed to improve the upload experience for larger archives. You can upload archives in parts. These parts can be uploaded independently, in any order, and in parallel. If a part upload fails, you only need to upload that part again and not the entire archive. You can use Multipart Upload for archives from 1 byte to about 40,000 GB in size.

In boto layer 2 this means that you have to use one of the following methods from boto.glacier.vault.Vault

  • concurrent_create_archive_from_file
  • create_archive_writer
  • upload_archive
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top