Please take a look on article, which is describing of Instagram architecture. It's quite interesting to know how 3 engineers handled 15-25 millions of user with 150 millions of photos per day.
Also I would recommend interesting blog, which describes different scalability solutions for popular web-resources:
- Facebook: An Example Canonical Architecture for Scaling Billions of Messages
- Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
There are lots of information.
But the most common things are:
- keep everything as easy as it's possible
- use well-known, mature technologies, which have good support of the community
- scale each part of application separately
And even though you may find explanations of each these, I'd like to focus on the last one according to your requirements.
When you want to make your application horizontally scalable, you need to consider each of clusters as separate logical module, regardless on actual number of servers, involved into cluster. F.e. for your web-application you can setup several instances of that application and set a load-balancer before them. Thus users can access single entry point (e.g. http://mysite.com), meanwhile actual instance may be arbitrary.
If you need to collaborate instances between each other, then you need to avoid in-memory storage, but use "key-value" storages, such as Redis, along with Messages Brokers, such as ActiveMQ, RabbitMQ or cloud version Iron.IO etc.
Datastorage you also need to consider as single entry point, e.g. sharded cluster (f.e. MongoDB supports out-of-box auto-sharding, and most of NoSQL solutions also have it - CouchDB, HBase). So basically you call some shard-controller, which according specific shard-key redirects to corresponding instance. But please note, that usually sharding might be quite non-trivial thing, therefore in most cases when you deal with RDBMS you need to use vertical scalability.
Considering everything above I would suggest you such structure:
For sure ideally all the servers must be near each other physically (f.e. in the same data-centre). But if you are going to use your application as World-wide, then you need to shard your instances according less latency. Here is quite interesting lecture about server's configuration (even though it's about MongoDb, I believe some approaches might be helpful in your case as well): https://www.youtube.com/watch?v=TZOH92mZIN8
But if do not need to use all your servers for distributed "map/reduce" computing, and for getting result you need only one particular server's instance, in that case I believe scenario #1 is fairly suitable and better for your needs (in case if you setup load-balancer before your instances).