Logical separation of database content

https://softwareengineering.stackexchange.com/questions/376454

07-02-2021
|

Pergunta

Background

I have an application which stores a lot of entities in a classical relational database (Microsoft SQL Server) and I use an ORM (Entity Framework) to query data from it. This database has one schema any only a "global view" to this data exists. The data is made available to users via an OData API.

For simplicity let's say it is a blog system with the tables Posts, Comments, Tags, Users. The exact relation between them is not so important for now.

Question

With the next version of the application we would like to operate multiple blogs within one software installation. All blogs with their contents are independent from each other and operated by different users.

How can such a extension to the software be done?

The most obvious solution to the problem seems to me to make the separation part of the data model. This would require to introduce a container entity and let all other entities point via FK to the container they belong to. Then in the application all queries need to be extended by a filter to only load data for the current container.

As you can imagine this requires a lot of development effort. Every query must properly specify this filter and if not, data from wrong blogs might be shown. Also unique constraints need to be reworked to be unique-per-container.

Another idea was to create even 1 database per container and connect to the corresponding DB depending on which container is accessed. The maintenance effort behind this also seems a bit complex to me.

Are there maybe other technical solutions to this question?

Unfortunately I am lacking of the correct terms to do a proper research on this topic. The best thing I could hope for is some SQL Server feature that allows storing multiple "sets" of data in the same database schema with strict separation.

Solução

You nailed the two obvious choices. But neither is quite as hard as you say, and which makes more sense depends on how much you expect to do in integrating content of one blog in another.

Separate Database approach:

This is the simplest to implement approach. You already have a connection string for connecting to your database. Just wrap it behind a function, that takes some sort of 'Request' object as an argument. It can then just return the right database connection string. Easy, and very modular. Essentially nothing changes in your application.

The only downside of this approach is that if you ever want to analyze across blogs (e.g. which Blog is the most active, which blog uses the word 'fart' most often, etc), that won't work well with this approach.

Within one database approach:

This requires an extra 'blogid' field in all your top-level tables (maybe all tables). Anywhere you grab data without first joining on another table (if you are already joining on a table with the blogid, you probably dont need to add the blogid to the new table - like perhaps the users table? This depends on the semantics you want too).

And yes, this does require a tiny change to your top level queries - saying where blogID=blogIDImWorkingOn. But it makes easier - stuff later - when you want to integrate/analyze content across blogs.

Hope this helps! Good luck!

Outras dicas

While I'm not really an experienced architect, but as the question stands, you might not want to change much at all.

You just add a Blogs table, and a "blog_id" column to posts (just as your "comment" probably already has a "post_id" column). Well, depending on which entities you want to separate, you might make other changes, for example, you say you want to separate the users, but since you don't provide much details, it's fair to assume you don't need much complexity. Maybe users shared between blogs would be a feature.

Depending on the size of the project, it might not be too efficient, but again, if you don't forsee any problems, no reason to overcomplicate the system

We have solved this issue by using one database per customer and have a single configuration database that maps which subdomain/customer connects to which database. Simply by storing the connectionstring, customer subdomain, and other metainformation in this configuration database. Similar to what you propose as your second solution.

It does require some setup per customer, but I dont think this is avoidable no matter how you implement it. We use dependency injection and connect to the correct database during container initialization. This is done on a per request basis.

This is not really a good answer, but it works for us, so it probably will for you. I have no clue if there are better ways to do it tho. 😅

Licenciado em: CC-BY-SA com atribuição

Não afiliado a softwareengineering.stackexchange