Question

I was thinking on some social media applications like facebook or linkedin. I read lots of articles on websites like http://highscalability.com/ and didn't find the correct answer.

Because, the biggest apps of now, use their custom systems. They use custom file systems or customized db-engines or customized web servers. They don't use the original iis, apache, mssql, mysql, windows or linux. They use lots of programming language for different problems. It's OK for them because of their load. They have to calculate every bit or something. They started on some small enviroments and they encountered problems and saw bottlenecks. So they founded new solutions.

Now, we can find some articles about their current systems. But we have no answer about what is the best start.

I need to learn the answer of "What kind of architecture is a correct start?"

I have some ideas on it but we need to be sure about it.

We think,

Use mysql for relational database. And a caching mechanism like memcached over mysql. And a rest api for business layer. We think using python for codding of rest api. And all systems run on a suitable linux distro. After all of these enviroments is ok, we can use any language or system for UIs. It can be a PHP site for web or a native application for IOS or Android.

We need your advice. Thank you so much.

(I am a good reader but it's my first question. I hope there's no problem.)

Was it helpful?

Solution

Following a similar question last year I compiled the techniques and technologies used by some of the larger social networking sites.

The following architecture concepts are prevalent among such sites:

Scalability

  • Caching (heavily, across multiple tiers and layers)
  • Data Sharding (preferrably by data-locality criteria)
  • In-Memory DBs for often referenced data
  • Efficient wire-level protocols (as opposed to what an enterprise typically considers state of the art)
  • asynchronous processing

Flexibility

  • service oriented architecture as a baseline principle
  • decoupled and layered components
  • asynchronous processing

Reliability

  • asynchronous processing
  • replication
  • cell architecture (independently operated subsets, e.g. by geographical criteria)

NB: If you start a new site, you are unlikely to have the kind of scaling or reliability requirements that these extremely large sites face. Hence the best advice is to start small but keep it flexible. One approach is to use an application framework that starts out simple but has flexibility to scale later, e.g. Django.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top