Wow! What a Question :)
Ok, lets discuss some aspects:
S3
S3 Performance is low most likely as you're not adding a Prefix for Listing Keys.
If you sharding by storing the objects like: type/owner/id
, listing all the ids for a given owner (prefixed as type/owner/) will be fast. Or at least, faster than listing everything at once.
Dynamo Versus SimpleDB
In general, thats my advice:
Use SimpleDB when:
- Your entity storage isn't going to pass over 10GB
- You need to apply complex queries involving multiple fields
- Your queries aren't well defined
- You can leverage from Multi-Valued Data Types
Use DynamoDB when:
- Your entity storage will pass 10GB
- You want to scale demand / throughput as it goes
- Your queries and model is well-defined, and unlikely to change.
- Your model is dynamic, involving a loose schema
- You can cache on your client-side your queries (so you can save on throughput by querying the cache prior to Dynamo)
- You want to do aggregate/rollup summaries, by using Atomic Updates
Given your current description, it seems SimpleDB is actually better, since: - Your model isn't completely defined - You can defer some decision aspects, since it takes a while to hit the (10GiB) limits
Geographical SimpleDB
It doesn't support. It works only from us-east-1 afaik.
Key Naming
This applies most to Dynamo: Whenever you can, use Hash + Range Key. But you could also create keys using Hash, and apply some queries, like:
- List all my records on table T which starts with
accountid:
- List all my records on table T which starts with
accountid:image
However, those are Scans at all. Bear that in mind.
(See this for an overview: http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/API_Scan.html)
Bonus Track
If you're using Java, cloudy-data on Maven Central includes SimpleJPA with some extensions to Map Blob Fields to S3. So give it a look:
http://bitbucket.org/ingenieux/cloudy
Thank you