How should I work around a particular “chicken and egg” problem with entity construction?

https://softwareengineering.stackexchange.com/questions/419304

18-03-2021
|

Question

I'm struggling to think of a concise way to phrase the question, but assume you were developing an application that managed employees at various company branches. You could potentially model this part of the application like this:

It's stipulated that every employee must belong to exactly one branch. To enforce this, you'd pass the branch the employee operates at to the Employee constructor and make the corresponding property immutable:

class Employee {

    public Employee(int id, string name, Branch operatesAt)
    {
        Id = id;
        Name = name;
        OperatesAt = operatesAt;
    }

    public int Id { get; }
    public string Name { get; }
    public Branch OperatesAt { get; }
}

You might then choose to model the Branch class like this:

class Branch {

    private readonly List<Employee> _employees;

    public Branch(int id, string name, ICollection<Employee> employees)
    {
        Id = id;
        Name = name;
        _employees = employees.ToList();
    }

    public int Id { get; }
    public string Name { get; }
    public IReadOnlyCollection<Employee> Employees => _employees.AsReadOnly();
}

In my application I'm using EF Core, and something similar to these models would work fine. I'm following Microsoft's architectural recommendations for ASP.NET Core development, and as such I'm trying to use a domain-driven approach to modelling the entities that EF Core operates with. This is why I'm trying to use constructors to ensure my entities can only ever exist in a valid state and prevent mutation of these entities except through methods that represent domain-relevant operations.

However, trying to manually setup a new branch with employees (for example, in tests) exposes a chicken and the egg problem:

var branch = new Branch(id: 1, name: "Example branch", employees: new[]
{
    new ProductConsultant(id: 12345, name: "John Smith", operatesAt: um...)
});

Is this a common issue to run into, and if so is there a well-defined solution to it? Someone I discussed this with mentioned the builder pattern could work. An alternative that I'm thinking of going with is to define a factory method on the principal entity that constructs, registers and returns an instance of the dependent entity:

class Branch {

    private readonly List<Employee> _employees;

    public Branch(int id, string name)
    {
        Id = id;
        Name = name;
        _employees = new List<Employee>();    
    }

    public int Id { get; }
    public string Name { get; }
    public IReadOnlyCollection<Employee> Employees => _employees.AsReadOnly();

    public Employee RegisterNewEmployee(int id, string name)
    {
        var employee = new Employee(id, name, this);
        _employees.Add(employee);
        return employee;
    }
}

Is this a suitable approach, or is there something more appropriate that can be done here?

No correct solution

OTHER TIPS

var branch = new Branch(id: 1, name: "Example branch", employees: new[]
{
    new ProductConsultant(id: 12345, name: "John Smith", operatesAt: um...)
});

This particular chicken and egg problem suggests that your design is inappropriate for the problem.

The key idea is that "an employee belongs to a branch" is a description of a relationship between some representation of employee and some representation of branch, but it doesn't say that employee has-a branch, in the data composition sense.

Consider

class Employee {

    public Employee(int id, string name, int operatesAt)
    {
        Id = id;
        Name = name;
        OperatesAt = operatesAt;
    }

which is to say, we bind the employee to branch identifier, rather than the branch itself.

Now your test can be written fairly easily:

var branch = new Branch(id: 1, name: "Example branch", employees: new[]
{
    new ProductConsultant(id: 12345, name: "John Smith", operatesAt: 1)
});

You can also play the same game in reverse, if you like:

var consultant = new ProductConsultant(id: 12345, name: "John Smith", operatesAt: 1)
var branch = new Branch(id: 1, name: "Example branch", employees: new[] { 12345 })

Other possibilities include treating the relationship between employee and branch as an entity in its own right, with its own lifecycle:

var assignment = new OfficeAssignment(employee:12345, branch: 1)

"It depends" is an Important Truth of domain driven design; we tend to employ these techniques in business critical parts of the domain, which implies that there is payoff in getting the small details right.

I'd seriously question the "Employee must have a branch" philosophy.

Does someone who works from home have a branch?
What if they literally drive around all day?
What about the Area Manager, presumably they work across a few branches.
And what about call center staff? Calling their office a branch starts to stretch the definition of branch, are warehouses branches too?

The problem is that employee's are not weak entities. They exist regardless of their being a branch. It is simply the flavour of the day that management has decided that every employee be associated with a branch.

In fact this is very similar to the problem from Wizards and Warriors. Whatever you hard code now will need rewriting later when requirements change: such as needing to write unit tests, or talking about employees without caring about branches. I wouldn't go so far as to implement a full blown ECS, but the point is to separate the different shearing levels so that they are easier to change.

First step would be to separate the Employee out into roles, aka interfaces. This way code that tangentially needs an employee for some reason can be passed a mock implementing just the expected interface. That code might not even care about branches.

Second step would be to treat Employee (assuming you still are implementing all those interfaces on one concrete implementation) as a strong entity on its own merits. That would mean that Employee does not require a Branch object to be constructed, a BranchId would be enough. Similarly for Branch except this time a list of EmployeeIDs are enough.

Third step is to see if you cannot tease apart Employee into seperate concrete classes implementing fewer interfaces. Chances are the software contains several perspectives on what an Employee is and only cares about one perspective or another for large tranches of work.

You're using the wrong solution for the problem at hand.

It's stipulated that every employee must belong to exactly one branch.

To enforce this, you'd pass the branch the employee operates at to the Employee constructor and make the corresponding property immutable:

This is not correct. The proposed solution solves a completely different problem, unrelated to your actual problem.

Your actual problem

It's stipulated that every employee must belong to exactly one branch.

You don't need to restrict the property or make it readonly. The simplest way to comply with this rule is dead simple:

public class Employee
{
    public Branch Branch { get; set; }
}

When the Employee class has a Branch property (i.e. not a collection of branches such as Branch[] or IEnumerable<Branch>), then each employee can only belong to one branch. You can change which branch they belong to, but they can never belong to more than one branch.

Basically, if there's only one chair in the room, only one person can sit. Since there's only one property that can contain one Branch object, therefore the Employee can only belong to one Branch (i.e. the one you set it to).

Your proposed solution

To enforce this, you'd pass the branch the employee operates at to the Employee constructor and make the corresponding property immutable:

Think about what your implementation does. The implementation consists of two parts: the constructor parameter, and the readonly nature of the property (post-constructor, if you will).

In terms of readonly, the only thing you're achieving is that an employee's branch cannot be changed after it has been set. That's it. But since you never mention this as a requirement, it's simply not necessary.

In terms of constructor parameter, what does it achieve for you? Likely, you're going to say that "this forces the consumer to pass the branch", but if your constructor merely assigns the parameter to your property, which yours does, it doesn't actually protect against anything.
The same default value for Branch that you'd get with an uninitialized property (that's not in the constructor) could effectively be passed via the constructor anyway. It's not blocking that behavior, so you're not really adding value.

The only thing you're doing here is creating more hoops for your consumer to jump through, without actually solving the issue you want solved.

Chicken and egg

However, trying to manually setup a new branch with employees (for example, in tests) exposes a chicken and the egg problem:

var branch = new Branch(id: 1, name: "Example branch", employees: new[]
{
    new ProductConsultant(id: 12345, name: "John Smith", operatesAt: um...)
});

Notice how the problem goes away when you remove your readonly constraint:

var employee = new Employee() { Name = "John" };
var branch = new Branch() { Name = "Houston Division" };

employee.Branch = branch;
branch.Employees.Add(employee);

This is just a simple example to show how your readonly constraint is the cause of the problem. This is not the final solution for your problem.

Entity Framework

This is why I'm trying to use constructors to ensure my entities can only ever exist in a valid state and prevent mutation of these entities except through methods that represent domain-relevant operations.

The things you're focusing on in your question and the implementation you've sought, is a code-specific implementation. But your focus is a predominantly moot point, because EF fundamentally changes this concept and throws your implementation out the window.

Entity Framework uses database modeling rules. That is to say that your entities reflect your database structure. EF does try to give you some code-friendly frills (such as navigational properties), but first and foremost the focus should be on how you want your data to be stored.

What you've described here is a one to many relationship. It's not zero or one to many, because each employee belongs to exactly one branch.

In databases, you do not link two rows directly to one another, you use their key. Most commonly, keys are (by default) expressed with non-nullable types such as int or Guid. I'll use Guid for this example. This already answers your question:

public class Employee
{
    public Guid Id { get; set; }       // Primary key

    public Guid BranchId { get; set; } // Foreign key
    public Branch Branch { get; set; } // Navigational property
}

public class Branch
{
    public Guid Id { get; set; }                          // Primary key

    public ICollection<Employee> Employees { get; set; }  // Navigational property
}

This describes the one-to-many relationship, sufficiently for EF to know what to do with it. An employee has precisely one branch, and a branch can have any amount of employees.

"And how do I configure this?"

Entity Framework likes conventions. When you stick to its conventions, you don't actually have to manually configure much. The above entities are complete. All you need to do is add them to a db context, and EF will do what you need it to do.

If you start changing the names in my example, e.g. Branch to OperatesAt, you start edging away from the conventions. When you do that, you're going to have to configure things manually since EF cannot rely on conventional inferences anymore.

"But what if I set BranchId to a value that does not exist in the branch table?"

You'll be able to do this in code, but the moment you try to save your changes to the database, the database will throw an error. This is because EF will specifically configure the relationship using a foreign key constraint, which demands that the BranchId must match an existing Id in the branch table.

Oh, and just to complete my thought, if you never save your changes after setting the incorrect branch ID, then it won't really matter because your (incorrect) new branch ID will never reach the database and thus will never become the "truth". Since your application is database driven, the database is the source of truth, more so than your actual runtime is.

Serialization and constructors

A second bit of advice that I want to give you is that serialization doesn't play nice with constructor parameters. Serialization logic tends to expect parameterless constructors, so that they can create an empty object and then set its properties one by one.

EF specifically serializes and deserializes your database data into those classes, so it's going to want parameterless constructors.

Note that some libraries can actually bypass this and can even instantiate types without parameterless constructors (using some clever reflection), but they do so by bypassing the parametered constructor, so it would still be pointless to have a parametered constructor.

DDD

I'm trying to use a domain-driven approach to modelling the entities that EF Core operates with.

Lesson one for domain-driven development: your domain has no external dependencies. Designing your domain for EF Core directly violates that lesson, since EF Core is an external dependency.

In short, because I can't explain this fully in this answer, your entities and your domain objects are two completely different layers, and they should be developed individually. It might be that one ends up looking a lot like the other, but that does not mean that you can merge them into the same class or class design.

Them looking like each other has to be a form of coincidence, i.e. there was no forethought or explicit intention to make them the same, they each have their own design for their own reasons and it just happens to be the same.

Secondly, in terms of how the (domain) branch and (domain) employee relate to one another, you have to consider the aggregates here. Does a branch own an employee, or is an employee an individual entity with its own lifetime?

In the comments, you agree that the second is more correct. If so, that means that both the employee and branch are separate aggregates (and presumably the respective aggregate roots), so you would never have such an A-in-B-in-A problem to begin with.

Working with aggregates requires more explanation than I can give here. I suggest finding online documentation on the matter to help guide your design of your aggregates and how they interact.

Domain validation

This answer is too long already to elaborate on this, but you are right that even though EF enforces a FK constraint, you'd like to also have some preliminary validation (e.g. that a branch id has been set) in the domain.

This is not the place to fully elaborate on that, but definitely look up examples of DDD projects and how they implement validation.

Most commonly, this is an abstraction in and of itself that consists of more than just some restrictive domain object constructors.

As a general validation tip, be very careful about the distinction between "should not" and "cannot ever". People who don't live in my house shouldn't enter my house at night - but that doesn't mean that they can't. Sometimes, I might even need them to, e.g. if I'm in a medical emergency.
Just because I don't expect strangers to enter my house at night, doesn't mean that it's a good idea that I should make it absolutely impossible for anyone to enter.

Based on your proposed solution, you are trying to enforce the rule at all times - no exceptions. But really reconsider those exceptions. Are you sure that there are never any exceptions? Never?

What about an employee who's temporarily absent?
What about an employee who's temporarily absent, their original department gets removed, and they have not yet been assigned a new department?
What about partial employee data, which isn't considered complete and should eventually be completed, but should already be stored (in its incomplete state) in your system while awaiting completion?

These are just some examples that mean that your "no invalid data EVER" approach can be too restrictive for practical reality.

Sometimes, instead of enforcing it, you're much better off having a validation step that gets enforced selectively. For example, an employee without branch can be stored in the system (no validation check) but cannot be assigned work or be paid (doing so requires the employee to pass the validation, which includes checking that a branch was assigned).

The best solution is the one you outline. Decouple the IDs from the database.

I would question your use of Ints here though as its hard to generate unique ones. Use GUIDs, or perhaps natural ids eg. "LondonBranch" instead.

You have various good answers, but none of them address the problem in the Domain Driven Design context. The thing is, you have to determine your Aggregate Root(s).

If Branch is the only Aggregate Root, it could indeed contain a RegisterNewEmployee method. It should probably return void, because an Aggregate Root shouldn't expose it's internals, but it would be a good way to have a single way to create new employees. Note that in this case, you would not have an EmployeeRepository, because only Aggregate Roots should have repositories. With EF that means your DbContext should not have a DbSet for Employees.

If you decide Employee should also be an Aggregate Root, it's best practice that the two Aggregate Roots only reference eachother by Id (which is preferably a Value Object and not a primitive). That means Branch could have a collection of EmployeeIds and/or Employee could have a BranchId.

In the comments you said a Branch needs access to it's Employees. Depending on the exact requirement, a collection of Ids might suffice, for example to check for duplicates or a maximum number of Employees per branch. In case you do need access to details encapsulated in an Employee from the Branch aggregate, things get a bit ugly. You could have an IEmployeeRepository as an argument in the method that needs employee details, or simply pass a collection of Employees to that particular method. Anyway, that's a bit out of scope for this question.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange