How To Delete S3 Files Starting With

Question 1

The S3 service does support a multi-delete operation allowing you to delete up to 1000 objects in a single API call. However, this API call doesn't provide support for server-side filtering of the keys. You have to provide the list of keys you want to delete.

You could roll your own. First, you would want to get a list of all the keys you want to delete.

import boto

s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket')
to_delete = list(bucket.list(prefix='137ff24f-02c9-4656-9d77-5e761d76a273'))

The list call returns a generator but I'm converting that to a list using list so, the to_delete variable now points to list of all of the objects in the bucket that match the prefix I have provided.

Now, we need to create chunks of up to 1000 objects from the big list and use the chunk to call the delete_keys method of the bucket object.

for chunk in [to_delete[i:i+1000] for i in range(0, len(to_delete), 1000)]:
    result = bucket.delete_keys(chunk)
    if result.errors:
        print('The following errors occurred')
        for error in result.errors:
            print(error)

There are more efficient ways to do this (e.g. without converting the bucket generator into a list) and you probably want to do something different when handling the errors but this should give you a start.

Question 2

you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.

this aws cli commands should work:

aws s3 rm <your_bucket_name> --exclude "*" --include "*137ff24f-02c9-4656-9d77-5e761d76a273*"

if you want to include sub-folders you should add the flag --recursive

or with unix commands:

aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I%  <your_os_shell>   -c 'aws s3 rm s3:// <your_bucket_name>  /% $1'

explanation: list all files on the bucket --pipe--> get the 4th parameter(its the file name) --pipe--> run delete script with aws cli

Question 3

Yes. try usings3cmd, command line tool for S3. First get the list of all files in the bucket.

cmd = 's3cmd ls s3://bucket_name'
args = shlex.split(cmd)
ls_lines = subprocess.check_output(args).splitlines()

Then find all lines that start with your desired string(using regex, should be simple). The delete all of thrm using the command:

s3cmd del s3://bucket_name/file_name(s)

Or if you just wanna use a single command:

s3cmd del s3://bucket_name/string*

I mentioned the first method so that you can test the names of files you are deleting and don't accidently delete anything else.

Question 4

For boto3 the following snippet removes all files starting with a particular prefix:

import boto3

botoSession = boto3.Session(
    aws_access_key_id     = <your access key>,
    aws_secret_access_key = <your secret key>,
    region_name           = <your region>,
)

s3 = botoSession.resource('s3')
bucket = s3.Bucket(bucketname)
objects = bucket.objects.filter(Prefix=<your prefix>)
objects.delete()

Question 5

While there's no direct boto method to do what you want, you should be able to do it efficiently by using get_all_keys, filtering them with the said regex, and then calling delete_keys.

Doing it this way will use only two requests, and doing the regex client-side should be pretty fast