質問

I'm setting up a SaaS system, where we're planning to give each customer their own database. The system is already set up so that we can easily scale out to additional servers if the load becomes too great; we're hoping to have thousands, or even tens of thousands of customers.

Questions

  • Is there any practical limitation on the number of micro-databases you can/should have on one SQL Server?
  • Can it affect performance of the server?
  • Is it better to have 10,000 databases of 100 MB each, or one database of 1 TB?

Additional information

When I say "micro-databases", I don't really mean "micro"; I just mean that we're aiming for thousands of customers, so each individual database would only be a thousandth or less of the total data storage. In reality, each database would be around the 100MB mark, depending on how much usage it gets.

The main reason to use 10,000 databases is for scalability. Fact is, V1 of the system has one database, and we have had some uncomfortable moments when the DB was straining under the load.

It was straining CPU, memory, I/O - all of the above. Even though we fixed those problems, they made us realize that at some point, even with the best indexing in the world, if we're as successful as we hope to be, we simply can't put all our data in one big honkin' database. So for V2 we're sharding, so we can split the load between multiple DB servers.

I've spent the last year developing this sharded solution. It's one license per server, but anyway that's taken care of since we're using VMs on Azure. Reason the question comes up now is because previously we were offering only to large institutions and setting up each one ourselves. Our next order of business is a self-service model where anyone with a browser can sign up and create their own database. Their databases will be much smaller and much more numerous than the large institutions.

We tried Azure SQL Database Elastic Pools. Performance was very disappointing, so we switched back to regular VMs.

役に立ちましたか?

解決

I've worked on SQL Servers with 8 to 10 thousand databases on a single instance. It's not pretty.

Restarting the server can take as long as an hour or more. Think about the recovery process for 10,000 databases.

You cannot use SQL Server Management Studio to reliably locate a database in the Object Explorer.

Backups are a nightmare, since for backups to be worthwhile you need to have a workable disaster recovery solution in place. Hopefully your team is great at scripting everything.

You start doing things like naming databases with numbers, like M01022, and T9945. Trying to make sure you're working in the correct database, e.g. M001022 instead of M01022, can be maddening.

Allocating memory for that many databases can be excruciating; SQL Server ends up doing a lot of I/O, which can be a real drag on performance. Consider a system that records carbon use details across 4 tables for 10,000 companies. If you do that in one database, you only need 4 tables; if you do that in 10,000 databases, all of sudden you need 40,000 tables in memory. The overhead of dealing with that number of tables in memory is substantial. Any query you design that will be ran against those tables will require at least 10,000 plans in the plan cache if there are 10,000 databases in use.

The list above is just a small sampling of problems you'll need to plan for when operating at that kind of scale.

You'll probably run into things like the SQL Server Service taking a very long time to start up, which can cause Service Controller errors. You can increase the service startup time yourself, create the following registry entry:

Subkey: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
Name:   ServicesPipeTimeout
Type:   REG_DWORD
Data:   The number of milliseconds before timeout occurs during service startup

For example, to wait 600 seconds (10 minutes) before the service times out, type 600000.


Since writing my answer I've realized the question is talking about Azure. Perhaps doing this on SQL Database is not so problematic; perhaps it is more problematic. Personally, I'd probably design a system using a single database, perhaps sharded vertically across multiple servers, but certainly not one-database-per-customer.

他のヒント

So there are Pros and Cons to both methods. Without knowing more about your application or the services you're looking to provide I won't be able to give a definitive answer but I'll throw out some of my thoughts on the matter.

My case for why you should use 1 Database for all clients.

Pros

  • Easy maintenance. Having one DB means that you only have to do your maintenance task on one location instead of many. Imagine the nightmare of handling 1000 different databases to back up. How about updating statistics on 1000 DB's or rebuilding indexes or DBCC CHECKDB?

  • Deploying Code. Let's say you have a problem with a stored procedure in your application code or reporting. You need to make a quick change... Now you have to deploy that change to 1000+ DB's. No, thanks, I'd rather not.

  • Easy Visibility. Just picture SSMS trying to open 1000+ DB's (shudder). It would practically make the problem useless and take a surprising amount of time to just open and render SSMS. Keep in mind, that's if you're able to come up with a decent naming convention.

Cons

  • Security. It would be easier to prevent folks from looking at other customers data if you had them as separate DB's. However there are some very simple things you can do to prevent this from happening.

  • Performance. It could be argued that limiting it one DB per customer means that SQL server will have to scan through less data to get the the information you're querying. However with proper data structure and good indexing(and possible partitioning) you can likely eliminate this as a problem all together if done carefully. I would recommend giving each table that contains customer specific data some sort of leading CompanyID to reduce that overhead.

Ultimately I think that your best bet is having one DB for you application and just splitting out customer data inside the DB itself. The troubles it will give you will be nothing in comparison to the nightmare of managing 1000+ databases.

Maximum Capacity Specifications for SQL Server states that there is a limit of 32,767.

As for whether it will affect performance, the answer is yes, but the ways it will affect performance, and whether it would be substantial, would depend on a myriad of factors.

I would go with the one database unless there is a good reason to split it out to 10,000 databases. One backup or 10,000 backups? One integrity check, or 10,000? There may be a good reason to use 10,000 small DBs, but you haven't given enough detail to determine that. The question you've asked is quite broad, and there's simply not enough information for anyone to know what the best answer is.

What you are talking about here is multi-tenant vs multi-instance architecture. I'm just bringing up these terms as you don't use them in your question but this is what you are discussing is called and if you just plug "multi-tenant architecture" into Google, you will find a wealth of resources and discussion about it, entire books have been written on it.

Some good resources regarding SQL Server specifically here:

https://msdn.microsoft.com/en-us/library/ff966499.aspx

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-design-patterns-multi-tenancy-saas-applications

I would be with other answers, in that I would lean strongly towards multi-tenant as a default, unless you have compelling reasons to favour multi-instance.

You don't need to split into thousands of individual client databases to scale, there are many other ways of doing that, that are likely to be preferable. Like clustering, replication, sharding, partitioning etc. Don't reinvent the wheel. There's nothing inherent that says you need to split this yourself manually on an individual customer level and indeed doing so is likely to increase significantly the costs of adding every new customer.

You are talking about "millions" of customers, think of any large-scale cloud-based software as a service, Gmail, whatever, you hardly think they create an entirely new database for each new signup, now do you?

There can be reasons where you do want to facilitate this, for example, if you are selling your product to a customer that MUST have it hosted in-house on their own infrastructure. But as a general SAAS rule, lean as a default to a multi-tenant architecture.

One of the downsides I can see to the single-database suggestion is to do with rolling back data - if you have a database per tenant set-up, you can restore each client's data independently (and to a particular point in time). If they are all in one database, this becomes much harder (and much more prone to error as it would likely need to be done via INSERT/UPDATE/DELETE statements).

Thanks to all who answered - really appreciate the points you've given me to think about. The general feeling I got was that a single database is preferable, but I would like to add some countervailing points in favor of the sharded architecture, and addressing some of the concerns that other people have mentioned.

Motivation for sharding

As mentioned in the (updated) question, we're aiming for massive sales worldwide, with literally millions of users. With the best hardware and indexing in the world, a single DB server won't take the load, so we have to be able to distribute across multiple servers. And once you have to look up which server any given customer's data is on, it's not much more work to give them a dedicated database, which makes things simpler in terms of keeping people's data neatly segregated.

Response to Concerns

  • Restarting the server takes a long time: OK, but in normal operation we're not intending to restart any servers. The system ultimately has to be online 24/7, so if we're going to have downtime it will have to be scheduled, anyway.
  • Backups/disaster recovery: We're using CloudBerry, which automates everything. Not a problem.
  • Naming databases/locating them in SSMS: Naming convention is easy, just based on the customer name. Add serial digits if names are shared.
  • Maintenance: If each database is as small as I envision, there shouldn't be any need to rebuild indexes manually.
  • Deploying code: We use Entity Framework, so every schema change will automatically be rolled out to each database with new releases. It is true, though, that if we discover a performance issue in production that can be fixed with a simple index tweak, it's not so easy just to push it out there. On the other hand, with each database being so small, it's unlikely that there will be showstopper performance issues on the production shards. And the common database remains a single DB, to which these concerns do not apply.

I'll be happy to hear back from you in the comments if you think I'm missing anything!

ライセンス: CC-BY-SA帰属
所属していません dba.stackexchange
scroll top