Question

background

My employer developed a web application we provide on software-as-a-service terms to our customers. To allow for multiple customers with a huge mass of data to be stored in a database, we chose to let the application create a schema per tenant. So if we had 5 customers we had something along the lines of

mysql> show schemas;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| tenant_1a          |
| tenant_d2          |
| tenant_yf          |
| tenant_ok          |
| tenant_8n          |
+--------------------+

Edit: Numbers and letters represent a hash around 32 chars long.

Tenant schema names are in fact not numeric but hashes of certain facts during creation. So not really predictable until creation. This prevents us from preparing query rules in advance.

what we have/do now

For now we are good with this and only run a mariadb galera cluster of three nodes behind a maxscale readconnroute balancer. But we will eventually hit a barrier, where adding nodes to this cluster won't do, because the overall data size won't fit on disk and/or the amount of tables will kill performance.

To keep the complexity of the applications database layer low, our devs would like us to handle the routing transparent from the viewpoint of the application: they want the application to just to talk to one "server" and not care about where which tenant is located physically.

To expand our application cluster to multiple mariadb galera clusters we could use maxscales schemarouter which exposes all schemas on all connected sub-clusters as if there was only one server. This fits perfectly into our devs expectations.

Now, a few months ago ProxySQL entered the scenery of database proxies and claims better performance paired with greater flexibility among other stuff.

actual question

We can route queries based on hard-coded schema names, but would refrain from doing so as this would mean to create/update them each time a tenant is created/deleted.

How could we replicate the dynamic behaviour of maxscales schemarouter with proxysql query rules, if at all?

Was it helpful?

Solution

You can find the answer here: Can proxysql have multiple listener? (Google Groups).

In short: ProxySQL's query rules support routing by schemaname.

For simplicity, let's assume you have 3 distinct clusters and we call the 3 clusters HG11, HG21 and HG31. The servers are 10.10.X.Y. To add some complexity, we will also enable read/write split, where the readers are HG12, HG22 , HG32.

INSERT INTO mysql_servers (hostgroup_id,hostname) VALUES
(11,"10.10.10.1"),
(12,"10.10.10.1"), (12,"10.10.10.2"), (12,"10.10.10.3"),
(21,"10.10.20.1"),
(22,"10.10.20.1"), (22,"10.10.20.2"), (22,"10.10.20.3"),
(31,"10.10.30.1"),
(32,"10.10.30.1"), (32,"10.10.30.2"), (32,"10.10.30.3");

Enable replication hostgroups

INSERT INTO mysql_replication_hostgroups (writer_hostgroup,reader_hostgroup) VALUES
(11,12),(21,22),(31,32);

Create rules for read/write split

INSERT INTO mysql_query_rules (rule_id, active, match_digest, flagOUT) VALUES
(1,1,'^SELECT.*FOR UPDATE',100),
(2,1,'^SELECT',200),
(3,1,'.*',100);

Sharding, sending traffic to masters

INSERT INTO mysql_query_rules (active, flagIN, schemaname,     destination_hostgroup, apply) VALUES
(1,100, "shard001", 11, 1),
(1,100, "shard002", 11, 1),
(1,100, "shard003", 11, 1),
(1,100, "shard004", 11, 1),
...
(1,100, "shard050", 21, 1),
(1,100, "shard051", 21, 1),
(1,100, "shard052", 21, 1),
(1,100, "shard053", 21, 1),
(1,100, "shard054", 21, 1),
...
(1,100, "shard100", 21, 1),
(1,100, "shard101", 31, 1),
...
(1,100, "shard150", 31, 1);

Sharding, sending traffic to slaves

INSERT INTO mysql_query_rules (active, flagIN, schemaname, destination_hostgroup, apply)
SELECT 1, 200, schemaname, destination_hostgroup+1 , 1 FROM mysql_query_rules WHERE flagIN=100;

Load everything to runtime:

LOAD MYSQL SERVERS TO RUNTIME;
LOAD MYSQL QUERY RULES TO RUNTIME;

Eventually, save everything to disk:

SAVE MYSQL SERVERS TO DISK;
SAVE MYSQL QUERY RULES TO DISK;

About "There is no way to have proxysql deal with the question of which schema is on which hg by itself?" : the answer is no, and this is intentional. Each HG may have identical schemas: beside the classic "mysql", "information_schema", "performance_schema", you can have other schemas that are used for other purposes (anything really). We can't ask ProxySQL to understand what are these schemas and automatically create rules.

Also, it is possible that you have created a tenant_1 schema in two different servers, but one has production data while the other has testing data: you don't want ProxySQL to automatically add both of them.

Finally, because ProxySQL is easy reconfigurable using simple SQL queries executed on the admin interface, if you want to automate the loading of new schema you can simple create a script that connects to each HG, lists the schemas, and connected to ProxySQL creating the rules if missing. The script can be very simple, but should also have the logic to exclude schemas that shouldn't be included.

Or you can create configurations that are pushed to ProxySQL using configuration management tools like Chef, Ansible, Puppet, Consul, etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top