Question

I have a web application - a simple web application archive file - that having several storage adapters for different storage types ie. MongoDB and CouchDB. By using this application I can store/query data to those databases with the web services I have written. Currently I can only have one single database instance per application, cannot have more than one which prevents me having parallel processing.


What I want is to run my application on several machines. And above of those, I want to write a UI enabling the Client to store/query the data without knowing the database types/addresses.

I have two different scenarios and wanted to ask you which one of them is a better way to do it and why.


1) Let's say I have three servers running three single databases - couchdb. I can upload my application to those servers and then with the help of my UI or a layer above my application I can define a map of servers so that I can store and query the data.

enter image description here

As you see above, database and application lies in the same server, so they are remote.


2) Let's say three servers are still running remotely but in this case my application is local. And I enabled it to accept several database instances.

enter image description here


I actually prefer the first one since in that case I won't need to extend my application but I wanted to hear what you think about it. I will be glad if you can provide some sources for that kind of distributed scenarios - I had no experience at all on that kind of stuff.

Was it helpful?

Solution

Please take a look on article, which is describing of Instagram architecture. It's quite interesting to know how 3 engineers handled 15-25 millions of user with 150 millions of photos per day.

Also I would recommend interesting blog, which describes different scalability solutions for popular web-resources:

There are lots of information.

But the most common things are:

And even though you may find explanations of each these, I'd like to focus on the last one according to your requirements.

When you want to make your application horizontally scalable, you need to consider each of clusters as separate logical module, regardless on actual number of servers, involved into cluster. F.e. for your web-application you can setup several instances of that application and set a load-balancer before them. Thus users can access single entry point (e.g. http://mysite.com), meanwhile actual instance may be arbitrary.

If you need to collaborate instances between each other, then you need to avoid in-memory storage, but use "key-value" storages, such as Redis, along with Messages Brokers, such as ActiveMQ, RabbitMQ or cloud version Iron.IO etc.

Datastorage you also need to consider as single entry point, e.g. sharded cluster (f.e. MongoDB supports out-of-box auto-sharding, and most of NoSQL solutions also have it - CouchDB, HBase). So basically you call some shard-controller, which according specific shard-key redirects to corresponding instance. But please note, that usually sharding might be quite non-trivial thing, therefore in most cases when you deal with RDBMS you need to use vertical scalability.

Considering everything above I would suggest you such structure:

simple schema

For sure ideally all the servers must be near each other physically (f.e. in the same data-centre). But if you are going to use your application as World-wide, then you need to shard your instances according less latency. Here is quite interesting lecture about server's configuration (even though it's about MongoDb, I believe some approaches might be helpful in your case as well): https://www.youtube.com/watch?v=TZOH92mZIN8

But if do not need to use all your servers for distributed "map/reduce" computing, and for getting result you need only one particular server's instance, in that case I believe scenario #1 is fairly suitable and better for your needs (in case if you setup load-balancer before your instances).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top