Domanda

Please note: although I'm specifically calling out MySQL here, I would also appreciate/entertain generic answers that apply to all/most relational systems.

As a developer I don't see the design benefits of grouping tables into different databases, especially if all the tables are somehow connected/related to each other. From a performance perspective, to me (a dumb developer), why should it matter if I have 100 tables located inside a single database, or 10 databases with 10 tables in them?

So I ask: Are there performance/security/other benefits to decomposing a monolithic database into smaller ones? What are they?

Also, what if I have hundreds of tables, and they are literally all somehow linked? Meaning, there are no completely isolated tables in the entire system; all of them have foreign keys in them to some other table. Doesn't this mean I have to have only 1 big/monolithic DB?

È stato utile?

Soluzione

for security, you want your data duplicated on many servers as possible, so if one falls, the others can share the load.

"classic" oltp databases performance issues raise from concurrent transactions. Due mainly to locks, time/transaction grows exponentially. Hence, a simple way to mitigate this issue is to have multiple databases, as there will be less concurrent transactions by database. Thus, this is an attractive duct tape (cheap and easy, at least compared to alternatives).

One thing to consider is that historically the bottleneck has been storage (disks), so databases tend to minimize disk access by consuming lot of cpu and memory. As well, partitioning tables helps reduce the load on disks.

As well, multiple databases allow to cache most of it in memory, seriously improving performance too.

There are other options. Some databases can use multiple machines, or you can pool machines in a virtual one, but that only goes so far too.

The current trend is to split oltp and olap database. This make sense, oltp require fast row "access", olap require high throughput column reads.

Another trick is to have two databases, one in read/write, and one in read only mode. They are in a master/slave configuration. Most reads are done one the slave, and changes on the masters are pushed to the slave. This alleviate the burden on the master database, which deal with transactions.

If you observe your database carefully, you will notice that very few operations really need acid transactions. That when it might be interesting to leave the classical model behind, and build two or three systems. By example one for tickets (which needs transactions as you don't want to sell out of stock products), and another system for the rest. If you update a product price and the old price is used for 0.1 second, it does not mater much (except for an auction system of course). Point is, with a traditional system, you can't sell tickets while you update the price, as the row will be locked, and it will be unavailable for much longer (1-10 second), as your traditional db (which is now massive) is much slower.

Typically the transactions db will be "in memory" (based on kernel locks, or multi-paxos), and the read one in a "nosql" db for 99% of the rest. The in memory logs are then merged to the nosql database. Combined with other tricks it scales linearly, and hence can grow in ways traditional databases can difficultly achieve.

This is more complicated, require analysis, learning new paradigms, and massive time to implement. To my knowledge, it is used mainly by "giants" like google or twitter mainly. This is the basis of the "big data" movements.

If your company uses a traditional sql database, you better off using all the tricks available (sharding, partitionning, read/write dbs, vm) before changing your full echosystem.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top