Question

Let's say I have images of different sizes on S3:

137ff24f-02c9-4656-9d77-5e761d76a273.webp
137ff24f-02c9-4656-9d77-5e761d76a273_500_300.webp
137ff24f-02c9-4656-9d77-5e761d76a273_400_280.webp

I am using boto to delete a single file:

bucket = get_s3_bucket()
s3_key = Key(bucket)
s3_key.key = '137ff24f-02c9-4656-9d77-5e761d76a273.webp'
bucket.delete_key(s3_key)

But I would like to delete all keys starting with 137ff24f-02c9-4656-9d77-5e761d76a273.

Keep in mind there might be hundreds of files in the bucket so I don't want to iterate over all files. Is there a way to delete only files starting with certain string?

Maybe some regex delete function.

Was it helpful?

Solution

The S3 service does support a multi-delete operation allowing you to delete up to 1000 objects in a single API call. However, this API call doesn't provide support for server-side filtering of the keys. You have to provide the list of keys you want to delete.

You could roll your own. First, you would want to get a list of all the keys you want to delete.

import boto

s3 = boto.connect_s3()
bucket = s3.get_bucket('mybucket')
to_delete = list(bucket.list(prefix='137ff24f-02c9-4656-9d77-5e761d76a273'))

The list call returns a generator but I'm converting that to a list using list so, the to_delete variable now points to list of all of the objects in the bucket that match the prefix I have provided.

Now, we need to create chunks of up to 1000 objects from the big list and use the chunk to call the delete_keys method of the bucket object.

for chunk in [to_delete[i:i+1000] for i in range(0, len(to_delete), 1000)]:
    result = bucket.delete_keys(chunk)
    if result.errors:
        print('The following errors occurred')
        for error in result.errors:
            print(error)

There are more efficient ways to do this (e.g. without converting the bucket generator into a list) and you probably want to do something different when handling the errors but this should give you a start.

OTHER TIPS

you can do it using aws cli : https://aws.amazon.com/cli/ and some unix command.

this aws cli commands should work:

aws s3 rm <your_bucket_name> --exclude "*" --include "*137ff24f-02c9-4656-9d77-5e761d76a273*" 

if you want to include sub-folders you should add the flag --recursive

or with unix commands:

aws s3 ls s3://<your_bucket_name>/ | awk '{print $4}' | xargs -I%  <your_os_shell>   -c 'aws s3 rm s3:// <your_bucket_name>  /% $1'

explanation: list all files on the bucket --pipe--> get the 4th parameter(its the file name) --pipe--> run delete script with aws cli

Yes. try usings3cmd, command line tool for S3. First get the list of all files in the bucket.

cmd = 's3cmd ls s3://bucket_name'
args = shlex.split(cmd)
ls_lines = subprocess.check_output(args).splitlines()

Then find all lines that start with your desired string(using regex, should be simple). The delete all of thrm using the command:

s3cmd del s3://bucket_name/file_name(s)

Or if you just wanna use a single command:

s3cmd del s3://bucket_name/string*

I mentioned the first method so that you can test the names of files you are deleting and don't accidently delete anything else.

For boto3 the following snippet removes all files starting with a particular prefix:

import boto3

botoSession = boto3.Session(
    aws_access_key_id     = <your access key>,
    aws_secret_access_key = <your secret key>,
    region_name           = <your region>,
)

s3 = botoSession.resource('s3')
bucket = s3.Bucket(bucketname)
objects = bucket.objects.filter(Prefix=<your prefix>)
objects.delete()

While there's no direct boto method to do what you want, you should be able to do it efficiently by using get_all_keys, filtering them with the said regex, and then calling delete_keys.

Doing it this way will use only two requests, and doing the regex client-side should be pretty fast

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top