Question

I have a simple efficiency problem that is running through my mind.

I have created a PHP code that uploads all files in my folders to my bucket on Amazon S3. My code has the ability to upload files in subfiles too without loosing its structure.

Basically, a user has to logon to my website and then according to the user's account name they can upload photos to my bucket on Amazon s3. The user can upload up to 10 photos - these are then modified to sub file types e.g. modified and thumbnails.

How should I upload structure my directory to be efficient on Amazon S3?

OPTION 1 (files in the same bucket but different folders - more organised)

username/originalfiles/picture01.jpg
username/original/picture02.jpg
username/original/picture03.jpg
....
username/original/picture10.jpg


username/modifiedpicture01.jpg
username/modified/picture02.jpg
username/modified/picture03.jpg
....
username/modified/picture10.jpg


username/thumbailspicture01.jpg
username/thumbails/picture02.jpg
username/thumbails/picture03.jpg
....
username/thumbails/picture10.jpg

Or

OPTION 2 (all files in the same bucket)

username-original-picture01.jpg
username-original-picture02.jpg
username-original-picture03.jpg
....
username-original-picture10.jpg


username-modifiedpicture01.jpg
username-modified-picture02.jpg
username-modified-picture03.jpg
....
username-modified-picture10.jpg


username-thumbailspicture01.jpg
username-thumbails-picture02.jpg
username-thumbails-picture03.jpg
....
username-thumbails-picture10.jpg

Or doesn't it make any different in Amazon S3?

Was it helpful?

Solution

It doesn't make a difference for organizational purposes, S3 folders are really just an illusion for the benefit of humans like us so that it seems familiar - there really are no physically separate folders like there are on your own machine.

The naming convention you use however will have a tremendous impact on performance, once you get to a certain point (for small number of files, its probably not going to be noticeable).

In general, you want the beginning part of your file/folder names to be 'random-ish', the more random the better...so that s3 can disperse the workload better. If the name prefixes are all the same, there will be a potential bottleneck. A short random hash at the beginning of each filename would be probably give you the best performance.

Right from the horses (AWS) mouth:

The sequence pattern in the key names introduces a performance problem. To understand the issue, let’s look at how Amazon S3 stores key names.

Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. If you introduce some randomness in your key name prefixes, the key names, and therefore the I/O load, will be distributed across more than one partition.

If you anticipate that your workload will consistently exceed 100 requests per second, you should avoid sequential key names. If you must use sequential numbers or date and time patterns in key names, add a random prefix to the key name. The randomness of the prefix more evenly distributes key names across multiple index partitions. Examples of introducing randomness are provided later in this topic.

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html

OTHER TIPS

It does not make any different in Amazon S3. There are just object keys.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top