Database constraints for a recursive folders structure

https://softwareengineering.stackexchange.com/questions/298639

10-10-2020
|

题

In an application, I have a recursive folder structure (like folders in OS X or Windows file system).

Each folder can contain three kind of things:

Other folders (hence the recursive structure)
Employees
Tasks

Here are the simplified models for those elements:

class Folder:
    parent_folder: Folder

class Employee:
    folder: Folder
    task: Task

class Task:
    folder: Folder
    name: StringField()

As you may understand from the models above, each Employee can be attributed a single Task.

The pain point is that an employee can only have a task which is located in this employee’s tree of folders (aka be part of its path). Let me illustrate this with some examples:

Valid:
    Employee in /MyFolder/OtherFolder
    Task in /MyFolder

Valid:
    Employee in /MyFolder/OtherFolder
    Task in /MyFolder/OtherFolder

Invalid:
    Employee in /MyFolder
    Task in /MyFolder/OtherFolder # not in Employee’s path

Invalid:
    Employee in /MyFolder
    Task in /RandomFolder # not in Employee’s path

Desired behaviour

If an Employee or a Task is moved so that an Employee's Task becomes unaccessible to said Employee, I’d like to have the relation between the two nullified, so that the Employee becomes task-free.

If, at any time, in the database, an Employee has a Task for which he doesn’t have access (is not in its path), the database should be considered corrupted.

Questions

Is it possible to implement this behaviour at the database level, so that it would work like Foreign Key constraints with nullifying cascading rules, for example?

If it's not possible at the database level, can you think of a way to implement this at the application level and keeping the database integrity at any time?

Subsidiary question: should we reconsider our data structure and implement one which would be more suitable for this particular problem?

What we tried so far (and the reasons why it didn’t work):

Implement this behaviour at the application level - doesn’t work because we can’t guarantee the integrity of the database in this case
Use Views or Common Table Expressions (CTE) - doesn’t work because Foreign Keys can’t be used within Views or CTE
Use a separate table for the relations - doesn’t work because of the recursive aspect of the structure which prevents us to use FK constraints for this goal

A note on the stack: PostgreSQL - SQLAlchemy

To keep this question (relatively) short, I simplified the problem and omitted a certain number of elements - please ask if you need more details

Edit: I should have mentioned that retrieving an object's folder from pure SQL is actually much less trivial than in my example

解决方案

If the database is under control of the application (not modified by anything else), do it at the application level, not the database level.

In theory you could do it with triggers, but I don't think you should. Checking the validity of such a structure requires computationally complex queries, and there is significant danger that this will have a negative impact on performance and scalability. It is a possibility, but this isn't a good design to choose without a clear need for it (and I don't see one given the information in the question).

I don't understand the concern "we can't guarantee the integrity of the database". Why not? If the database can only be modified via the application, and you ensure (and thoroughly test) that every action the application performs is done correctly and doesn't result in an invalid state, then you should not have a problem with database integrity. And this sort of testing is what you need anyway to ensure that your program is robust.

If you need additional assurance for some reason (most likely because the database can be modified outside the application), you might consider triggers. However, if that is not performant, you could create some custom code that performs an integrity check on demand. Or you might consider providing an API for others who need to modify the database, rather than allowing the database to be modified directly

其他提示

This is perfectly possible with triggers, and if you must ensure integrity at an application-independent level, this is possibly your best shot. From this example you can see that recursion is possible in POSTGRESQL stored procedures, so traversing the tree should not be too hard to be implemented.

Of course, you may have to implement "AFTER INSERT", "AFTER UPDATE" and "AFTER DELETE" triggers for tables Employee and Task, and to make this bullet proof without significant performance hit you have to be careful, but to my experience, these are solvable problems.

许可以下： CC-BY-SA 和归因

不隶属于 softwareengineering.stackexchange