Converting single client SQL Server database into single database multi tenant

https://stackoverflow.com/questions/8964529

18-04-2021
|

Question

We currently have a system where each of our users gets a database. We are now moving to a one database multi-tenant schema so one database can house many customers.

A few questions:

Is the a multi-tenant conversion tool in existence? Or is it just the process of creating a Tenant table and adding a TenantID to every other table?
Is there an easy way to implement multi-tenant without having to refactor our code that communicates with the database?

We have an Odata.svc that does all the talking to the database (our front end clients range from .net frontends to iOS devices). I read a little about using Federation to perform filtering on the tenantID predicate so the code does not have to be changed at all. Is this possible?
Is there a recommended limit on how many tenants should be in a database?

I'm gathering this is a stupid question (how long is a piece of string). We will most likely be hosting the end solution up on Azure.

Look forward to any advice anyone can give me. We are making a fundamental change to our processes so I want to be on top of it before we are under it.

Solution

Automation?

In theory, it should be possible to craft a tool that makes it much easier to perform this daunting operation (going from single-tenant to multiple-tenant). However, I don't think such a tool is in existence, given the limited audience for such a product. It would be very nice if one surfaced.

Ideas about manual conversion

Start by designing a new multi-tenant database schema. (This means merging all single-tenant databases schemas with any shared schemas you possibly have.) I'd like to make it like it would be if it was designed with no legacy considerations.

You obviously need a Tenant table, which will need to be referenced by many of your existing single-tenant tables with a Tenant_id column. For instance, a table with users will require this to uniquely associate users with a tenant.

In the case of a simple Products table (with Product_id as primary key), it should be possible to add a Tenant_id column, yielding a table with a composite key (Tenant_id and Product_id). But if you'd written the application from scratch I believe a Product table with no tenant referencing is the proper way. This also lets tenants share products, instead of adding duplicates. Since one tenant may have products with Product_id 1,2,3 and another 1,2 you cannot simply merge the tables, because you cannot use the same ID twice -- you need unique primary key values. One way to solve this problem is to write a program (in Java or another high-level language) that reads all data from a tenant database into in-memory objects, then writes the data to the multi-tenant schema. The process repeats for the next tenant database, and so forth. That way you would have Product_id values 1,2,3,4,5. A quick-and-dirty way would be to add a number, say 1,000, 2,000 and so on, to all ID values in each schema and simply cross your fingers that no conflicts arise.

Code that communicates with database

You will need to rewrite most database queries to account for the fact that the database is now multi-tenant. This is a daunting task, especially considering the implications of introducing a bug which lets one tenant fiddle with another tenant's data. However, some techniques could make this task easier. For instance, a Tenant View Filter could reduce the amount of work required substantially.

Limit on number of tenants

I have never seen a recommendation to limit the number of tenants in a multi-tenant structure. On the contrary, a strength of the multi-tenant approach is its scalability. Today you can easily create clusters of database servers or use cloud-based solutions to add more hardware power seamlessly, as needed.

Links of interest

OTHER TIPS

To be honest, in my experience you can't automate this. You are moving very important data from your infrastructure into your data model. Every query has been written on the assumption the tenant has already been established. Every query & SP will therefore be changed to reference back to your tenant table and parameterised.

You ask in Q1 if you just add the tenantID to each table. That would be one approach, but not one I'd advocate. It leads you wide open to having incorrect data (no enforcement that the children have the same tenantIDs as the parent, or even that they are all the same!) You need to add a Tenant table for sure and then carefully choose which tables need to reference it. It will not be every one. Some will require it, some you might choose to put it there for performance reasons. If you decide on the latter, you will no doubt require extra checking mechanisms to keep your data meaningful.

If you were in Oracle, what you may be able to do is rework each table into a view (still doing all of the above) then stuff the tenantID into the session and do some Fine Grained Access on it to hide most of the detail from the client. Hard to do well though and I'm not sure what the SQL Server equivilent is. Could be worth some research.

What's the reason behind merging the DBs? Do you need some cross-DB report or something? Otherwise single-tenant has many advantages (multiple upgrade & downtime schedules, performance can be better depending on how [de]normalised you go, ease of single tenant data extract/reporting, ease of removal when you lose a tenant). The cloud solution & single tenant could potentially work in your favour here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow