Question

I am using boto and Python to store and retrieve files to and from Amazon S3. I need to get the list of files present in a directory. I know there is no concept of directories in S3 so I am phrasing my question like how can I get a list of all file names having same prefix?

For example- Let's say I have following files-

Brad/files/pdf/abc.pdf
Brad/files/pdf/abc2.pdf
Brad/files/pdf/abc3.pdf
Brad/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/

When I call foo("Brad"), it should return a list like this-

files/pdf/abc.pdf
files/pdf/abc2.pdf
files/pdf/abc3.pdf
files/pdf/abc4.pdf

What is the best way to do it?

Was it helpful?

Solution 2

You can use startswith and list comprehension for this purpose as below:

paths=['Brad/files/pdf/abc.pdf','Brad/files/pdf/abc2.pdf','Brad/files/pdf/abc3.pdf','Brad/files/pdf/abc4.pdf','mybucket/files/pdf/new/','mybucket/files/pdf/new/abc.pdf','mybucket/files/pdf/2011/']
def foo(m):
   return [p for p in paths if p.startswith(m+'/')]

print foo('Brad')

output:

['Brad/files/pdf/abc.pdf', 'Brad/files/pdf/abc2.pdf', 'Brad/files/pdf/abc3.pdf', 'Brad/files/pdf/abc4.pdf']

Using split and filter:

 def foo(m):
    return filter(lambda x: x.split('/')[0]== m, paths)

OTHER TIPS

user3's approach is a pure client side solution. I think it works well in small scale. If you have millions of object in one bucket, you may pay for many requests and bandwidth fee.

Alternatively, you can use delimiter and prefix parameter provided by GET BUCKET API to archive your requirement. There are many example in the document, see http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html

Needless to say, you can use boto to achieve this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top