Question

In my production environment, I have a single instance of MySQL server running on 16gig of memory that handles up to 20,000 queries an hour. The size of one my table is growing at the rate of 2 million a per month. Both these numbers are expected to go up as time passes yet I'm not sure when I need to improve the architecture.

How could one be proactive about the situation and go about future proofing the system?

Does upgrading the hardware buy much in terms of time and capital efficiency?

What would the common practice be in this instance, if we double the traffic every 3 months, would sharding be a natural progression? Or are there other alternatives?

How do I even tell if my system is reaching it's peak, what are some of the tools available for profiling the database? And what are the metrics I would use to measure it?

Was it helpful?

Solution

It's very difficult to answer such a vast question of scalability.

First, hardware upgrade on a single machine is not a long -- not even a short -- term option, as you seem to plan on an exponential growth (x2 every 3 months is big, starting from 2M rows per months). So you have to find a distributed scalable hardware architecture.

Then two basic options come into mind:

Stick to SQL

If you stick to SQL storage for your ever growing tables, you'll have to choose between clustering and replication. The latter being often more cost-effective and faster than the former, from my point of view, but a bit harder to settle.

Here, you'll find a very interesting paper on Advanced MySQL Replication Techniques.

You could then start with partitionning or better, sharding, as you mentioned previously.

Note that some MySQL products seem to offer auto-sharding clusters.

Mix with NoSQL

The other option is obviously to envisage using NoSQL technologies on your monster tables. Distributed key-value storage systems are almost costless in terms of scalability, meaning linear at most.

Another point it that key-values work gracefully with distributed caches such as the well known Memcached, very easy so set up with APIs in most languages, granting really good performances at very low cost.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top