How can I prepare my platform to drastically expand for the future?

https://softwareengineering.stackexchange.com/questions/396180

01-03-2021
|

Pregunta

Preface

I'm currently building a very small scale platform I plan on bringing into a trial-production phase here within the next month.

As of now, it's just a simple web application. It contains things like:

Users/Profiles
Commenting Systems
Posts created by users

Nothing too astounding or complicated. I have more features in the works though. Such as:

Subscriptions/Payment processing
Two-Factor Authentication
Automated emails and text messages
Mobile application

My Setup

I have three main servers.

Database: Isolated SQL server only accessible by the API server. Contains all persistent data relevant to the platform.
API: Go server that provides indirect access to the database server and manages authentication and tokens.
HTTPS: Nginx for serving static content and another Go server for managing dynamic content. Heavily communicates with the API server.

I also use CloudFlare for DNS management, SSL certificates, and DDoS protection.

The idea of having an API server is to ease the integration of a mobile app into the system.

Analysis

I'm not really focused particularly on the features my project has, but more so of the practices I should focus on when expanding it. I expect higher volumes of traffic to be the main issue that could cripple my current setup.

Seeing that I only have one server serving static and dynamic content, I'm worried once I start getting more and more requests (and maybe even requests from other places in the world) I'll run into problems.

I'm aware of things such as load balancing, but the question I'm getting at here is what I should be asking myself to make a well informed and intelligent decision on how I'll actually implement that.

Summary

The primary goal here is to prepare this platform to face some of the problems larger platforms (such as SE) face and what things I should consider along the way. Like how I should build my application in the earlier phases to sort of support the bigger stuff that comes later on.

Solución

Informed and Intelligent Decision

...

Do you have data?

If you had data, you would have the basis for rational decision making (as this would be very informative).

Otherwise you have guess work which is the basis for irrational decision making (which some people are very effective at, if your asking then you're probably not one of those people).

Are you replacing a site which has a high load of traffic?

If you were replacing a site that already has a any traffic load, you would certainly be able to make an intelligent decision. If the intelligence of a decision is the difference between the benefit (a website serving users within some level of service) and the costs (the specific implementation). It would be trivial (if somewhat laborious) to identify candidate solutions and pick the optimal candidate.

Otherwise its impossible to determine the intelligence of a decision upfront, only in retrospect. This obviously does not help you right now as you must choose to do something.

Be Humble

You do not know what is around the corner, and you cannot know what problems your site will face. As such you are going to write code and make choices that seem okay right now, that will not be okay later - regardless of how informed or intelligent you believe you are being.

When that problem comes along, generally you will find that the new solution would not have been justifiable before. In fact you probably would have discarded it as being too complex/difficult/over kill for what you were experiencing, and you would have been right. The world just changed.

That being said Change is going to happen, and you can make preparations to make change simpler.

Concerns

Roughly speaking there are three axis of concern:

Functional: This is the system behaviour like user profiles, reports, end of day processing, etc...
Operational: This is how you keep the lights on. Can it scale, how many users can it handle, does it recover from errors well, etc...
Developmental: This is how easy it is to understand and change the software. Can a feature be added, how quickly and without breaking other functionality/operations. How easily can a bug be found, can the bug be spotted before a user spots it, etc...

Every change you do is going to affect these axis. Preferably you want all of these axis to be outstanding, a Highly Functional, low maintenance, dream to improve. Unfortunately most of your choices will make one of these dimensions better, and the other two worse.

Keep Things Modular

You want modularity because your current tools will age and break, they will not scale with demand, nor will they change with your needs.

At least if they are modular you can replace them. You can swap in better maintained tools, higher capacity tools, faster tools, more capable tools, or selectively choose between tools on demand.

A Module should contain, and hide a choice that will change. Generally it is a good idea to implement at least 3 variants of that module. I would generally pick:

The real implementation that you will be using in production
A light/single user implementation that will mostly be used by devs, or by automation tests to allow for faster dev/behavioural checks.
A Stub/Proxy to support testing. It would log interactions, verify usage, and participate in module level tests.

If its possible to implement several real implementations that will certainly reveal deficiencies in the module interface.

Communication and Coupling

Reduce the need for communication. The fewer modules your piece of software has to be aware of the better, the less often they have to communicate, and the less complicated the communication, the better that communication is.

A single page web app can cache much of the static and dynamic data it requests. If done sensibly this can reduce load on your own system.
A Database can be paired with an in memory cache speeding up access for reads.
Keeping business actions local (two users can do the same action without interfering with each other) allows you to scale horizontally as there is no need for a synchronisation point.

Oddly enough sometimes improving communication will force you to communicate more information. For example an Idempotency ID for actions that should only be done once, or a timestamp to indicate a sunset on a given request.

Good Logging and Monitoring

When faced with millions of users, every bug will be found, each limitation revealed. When they are found you want to know that they happened, what they were, and an explanation. To do that effectively you need to log and log well.

You need to log:

what is happening on the machine
what resources are available
what the exception was and every detail about it
where that exception occured
how you got to the point of that exception
why the program was even attempting to run this code
interesting pieces of domain knowledge
etc...

The things that are logged need to be classifiable by a rule (or some script/tool).

This then needs alert you. This might mean an sms to your mobile in 10secs. It might mean a bug report for you to look at on Monday.

Good logging is HARD, but it is an essential piece of keeping the lights on.

Craftsmanship

When something announces itself as out of place, take the time to put it back into place.

You, and your team will be constantly changing the code. What this means is that you will learn, which then means that some of the code that is already written is not as good as you could now make it. Take the time to improve it.

Improve legibility (you will read it a lot)
Improve consistency (remove surprises, and make the differences trivial to see)
Improve logic (so you didn't check for null here, add in that check.)
Improve algorithms (pick a better sort, test for empty instead of count = 0)
Notice when something has become too small, too large, too integrated and rearrange the behaviour.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange