Galera Arbitrator system requirements compared with the actual Galera MariaDB node requirements

https://dba.stackexchange.com/questions/230085

23-01-2021
|

문제

Right now our setup is CloudFlare > 1x HAProxy (Dallas,TX), 3x App servers (Dallas,TX), 3x MariaDB nodes in a Galera multi-master cluster (Dallas,TX). The App servers auto-scale horizontally to handle whatever traffic gets thrown at them. HAProxy and DB servers are scaled up when needed. The App servers also maintain individual SQL read caches to ease the load on the DB.

Our main selling point is keeping our product online no matter what happens, as our solution caters to government emergency services, and since we're a start-up, our budget is really tight.

It was unexpected that our current cloud provider (which we cannot change in the following months) is experiencing DDOS for around 30minutes a month. For our situation not even one minute of downtime is acceptable.

So I am thinking of going with something like CloudFlare > 3x HAProxy (Dallas,TX; Freemont,CA; Newark,NJ) > 6x App servers (Dallas,TX; Freemont,CA; Newark,NJ) > 3 Galera Multi-master + 2 Galera Arbitrator (Dallas,TX; Freemont,CA; Newark,NJ; Atlanta,GA; San Francisco,SF).

This will increase our costs quite a bit, and that is why I want two Galera Arbitrators if I can skimp on costs while at the same time increase stability.

My main question is, what are the resource requirements for Galera Arbitrator, and would it be worth having three MariaDB nodes and two Galera Arbitrator nodes, instead of only three MariaDB nodes in the cluster? Another way to put it is, is it worth using garbd on setups over three nodes in size?

Provided that route-wise our HAProxy servers will choose the App servers in the same region, which subsequently choose the DB server in the same region. The choices are based on a blend of connect time and server load.

해결책

Overall I'd say its best to widen the flow control and send/recv windows, and time-outs between nodes.

If you get down to one node, you can use set global wsrep_provider_options="pc.bootstrap=TRUE" to turn it into an active node without garbs. Lacks automation however is more durable overall.

The nature of the DoS might have an impact in the design. If the app server for the site is down is there sufficiently reduce load on the galera node? Or is the network throughput/latency compromised?

Recommending sticking with a 3 node cluster initially. Its a big step up from a one site.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange