DataAccess Layer coupling with Domain Layer

https://softwareengineering.stackexchange.com/questions/399542

03-03-2021
|

Question

We have an issue with how the implementation of the Data Access layer (EF6 Includes more specifically), influences the behavior of our Domain layer.

A theoretical example to illustrate:

Application in 3 layers, DDD:

Domain layer: simplistic domain model
- An Order has Orderlines
- An Order has a public property "CanBeExtended". Business logic is
  "when there are less than 10 Orderlines, an order can be extended"
- Note: This implementation is unit testable, no dependencies
Data Access layer: this layer contains the OrderRepository, which uses EF6 to retrieve an Order object from the database.
- GetOrderById is the only relevant method for this example
- References the Domain layer.
Service layer: contains "Controllers" in MVC-speak, is accessed via the API
- References both Data Access and Domain layer
- Exposes a method "CanOrderBeExtended": input is the id of the order, returns a bool whether the order can be extended

Implementation of this Service layer: Retrieve the Order object from the repository using GetOrderById, then call its property CanBeExtended, and return that value.

Problem: Depending on the implementation of the Data Access layer, the Domain layer behaves wrong: depending on whether you Include the Orderlines of the Order in GetOrderById, the CanBeExtended behaves differently.

The Bug: When you forget the Include, all Orders are extendable.

Possible solutions:

Lazy loading is considered, but it's not our favorite. It's a technical detail of EF6, and it might turn out to be problematic in a late stage, and highly dependent on the actual setup: db server access speed,...

We're looking for a more fundamental solution:
How can we mitigate this dependency between the Domain and the Data Access layer?
How can we make sure we don't forget Includes?

Making multiple and specific methods on the repository is considered as well, but it doesn't appear to scale well:

This example might then use something like "GetOrderWithOrderLinesById" which contains the Include(o => o.OrderLines).
But to decide which method is needed, we'd have to review the implementation of CanBeExtended in the Domain layer.
Additionally, when that implementation changes, the repository method needs to change as well (e.g. to add extra Include statements), or another, new method would turn out to be needed, thus highly coupling the Repository (in the Data Access layer) with the Domain layer.

Fowler nor Evans seem to touch this specific topic AFAIK.

Any and all thoughts welcome!

Solution

First of all

A relevant answer of mine I think you should read.

Your question is based on your interpretation on the fundamentals of repositories with EF, and that interpretation is not complete (that's not to say mine is provably complete either). This leads to the issues you're uncovering.

However, it requires you to change your interpretation, rather than try to solve the issue the way you're trying to now. I strongly urge you to reconsider your architectural approach, because a more abstracted variant will make this problem a lot easier to tackle.

Answering the actual question

Never forget the responsibilities of each layer.

The DAL stores and retrieves data for you.
The BLL performs operations and adheres (and forces other to adhere) to the required business rules, which may or may not involve speaking to the data store.
The Service layer (as per your naming) is the public interface which gives users access to the BLL operations.

Problem: Depending on the implementation of the Data Access layer, the Domain layer behaves wrong: depending on whether you Include the Orderlines of the Order in GetOrderById, the CanBeExtended behaves differently.

TL;DR The DAL is a dependency of the BLL. The BLL cannot question the DAL's responses as the BLL's actions are based on the responses received from the DAL.

The core issue here is that you are making the BLL responsible for making a judgment call (can this order be extended?), but you are not giving the BLL the autonomy to verify the information it needs to make that judgment call.

The way you currently have it, the BLL's validator can be fooled by passed a bogus list of orderlines into it. The validation is supposed to work as a line of defence which prevents storage of data that violates business rules, and as you've pointed out yourself this line of defence can be easily circumvented. The simple truth is that your line of defence is ineffective.

If the BLL is expected to be the judge on whether the order can be extended or not, the BLL must be able to check the database to count how many lines currently exist. Without it, the BLL cannot make this judgment call.

In other words, you expect something along the lines of:

public bool CanOrderBeExtended(Order order)
{
    var existingOrderLineCount = unitOfWork
                                     .OrderLineRepository
                                     .GetByOrderId(order.Id)
                                     .Count();

    return existingOrderLineCount < 10;
}

And there you have your answer. The BLL will not behave differently based on the data an external consumer passes to it. The BLL will protect the datastore by verifying the datastore itself.

Note that if the Order object is fetched by the BLL itself and the DAL method that was used to fetch the order already guarantees that all existing order lines are fetched as well (e.g. GetOrderWithAllOfItsOrderLines); then the validation does not need to access the database again, because the DAL itself has already guaranteed that the current list of orderlines is correct.

Something like this would also be correct:

public OrderDto GetOrderById(int orderId)
{
    var order = unitOfWork
                    .OrderRepository
                    .GetOrderWithAllOfItsOrderLines(orderId);

    var result = new OrderDto()
                 {
                     Order = order
                     CanBeExtended = order.OrderLines.Count() < 10;
                 };

    return result;
}

The BLL relies on the DAL to provide the actual contents of the datastore.

But what if the datastore doesn't return all orderlines?

If the datastore is not performing its job (e.g. it only returns one orderline even though there are 15 order lines in the database for this order), then the BLL cannot be expected to protect against that. That is an issue you have to detect (via testing) and solve in the DAL.

The DAL is a dependency of the BLL. The BLL depends on the DAL to be correct. If the DAL is not correct, the BLL cannot be expected to know better than what its own dependencies tell it.

Your proposed solutions

Lazy loading is considered, but it's not our favorite. It's a technical detail of EF6, and it might turn out to be problematic in a late stage, and highly dependent on the actual setup: db server access speed,...

If you lazy load from the BLL, that means you are leaking IQueryables. DO NOT DO THIS. Leaking IQueryables outside of the DAL means that you've compromised the separation of your layers.

The issue you're highlighting (a second DB call) further compounds the problem but it is not the main decision point as to why you shouldn't be doing this. There are valid solutions which do use a second db call, which might not be what you need in this particular instance but are a better solution than lazy loading nonetheless.

Making multiple and specific methods on the repository is considered as well, but it doesn't appear to scale well

It scales as well as you can reasonably expect it to.

In this context, you stating it does "not scaling well" seems to imply that you are expecting a generic solution to load any related entity for any entity you're currently dealing with. That's not a reasonable expectation as it compromises the DAL's internal responsibility.

The DAL decides what data gets fetched from the database. The BLL does not decide that. At best, the BLL is allowed to communicate to the DAL using methods which the DAL expliticly exposes.
Simply put, the DAL decides to allow consumer to ask it to retrieve the order lines (= the DAL exposes a public method), and then consumers (the BLL in this case) are able to use this public method to do the thing the DAL has allowed them to do (fetching the order lines).

Allowing the BLL to dynamically alter the fetched data without having an explicitly defined DAL method implies that you are either leaking your IQueryables (as mentioned before, do not do this), or your DAL has some specific publci method which you've created for this purpose (which avoids IQueryable leakage).

The latter inherently means that you've custom built a DAL method that the BLL calls, which according to you "does not scale well". Therefore, by elimination, the only way in which you could have a method which "scales well" is by leaking the IQueryable, which you should not do.

In short, your expectation of how scalable something needs to be is leading you to compromise the separation of your layers. If you want to uphold the separation of your layers, the DAL needs to explicitly expose custom methods (i.e. written by the DAL developer, not just passed on from the EF library) which allow for the retrieval of related entities.

But to decide which method is needed, we'd have to review the implementation of CanBeExtended in the Domain layer.

This is a vary confusing point your making.

I think you're putting the cart before the horse. The DAL doesn't include the order lines just because the BLL wants to validate the data. The DAL includes the order lines because it has either privately decided or is explicitly asked to include them.

Why the DAL was asked to include them (by the BLL) is irrelevant as far as the DAL cares. Yes, the BLL may have asked the DAL for the order lines specifically to validate some business logic, but the DAL doesn't need to know that. All it needs to know is that it was asked to fetch the order lines from the datastore, and therefore it returns the order lines from the datastore.

That is the only responsibility of the DAL. It does not care about your business logic at all, because the DAL lives one layer deeper than the business logic.

OTHER TIPS

Your architecture is sound.

The root of the problem here is that... well... you haven't actually defined a domain model. In many ways your question can be rephrased into something like, "How can I make sure developers write the correct code?". Well... you can't.

What I am seeing here is the tension created by using a data model (EF) as a domain model. You have encountered a mismatch between these two models. Clearly you cannot have a domain entity that contains a property (CanBeExtended) that only sometimes returns the correct business answer when queried. Think about this. You have a business rule that says, "An order can only be extended (returns true) if it has less than 10 order lines" and model that does not always enforce that invariant.

Either you must include the entire OrderLine collection every time an Order is hydrated, or you must include some additional aggregation data (e.g. int TotalNumberOfOrderLines) that can be used to enforce your invariant independent of the number currently in the collection (this option is also more performant). It really is as simple as that.

Depending on whether or not you can leverage EF to carry this our for you, you may need to put a stop to ad-hoc queries and encapsulate Order retrieval into an explicit domain Repository in order to make sure hydration is done sufficiently to enforce your rules. At this point you will have realized a first step in the separation between the data model and domain model.

Importantly, and this is in reference to your conversation with @Flater, the domain model does not need to directly reference any Repository instance. Not only does this make testing far more difficult than necessary (and create a dependency), it is always possible to know which data is necessary to carry out a use-case at the Service level:

// overview of service flow

Get data (model) necessary to carry out use-case
Coordination data (model) to fulfill objective
Persist data (model)

This allows for better performance because you will know what data is necessary for step 2 at step 1, and can therefore issue top-level queries (which can then be optimized) in preparation for step 2.

This is simply a bug and we need to see how to fix it and avoid it in the future. But let's see if there is a compiler time coupling problem

I mostly agree with @flater's answer. I am starting it in a different language to provide a different view

In your design order and order lines are entities. Order is also the aggregate. So in domain driven design lingo what it means is

Your repositories will NOT be able get order line by id
When ever you want to modify an order line, you have to get the associated order from the repository and modify and order line associated with it, and then save the order.
Your repositories should have methods to get order and save orders
Your repository should get the full order when the order is got.

Now you rightly have a repository that produces an order. However if you miss to code the Include(o => o.OrderLines) (or don't use lazy loading for OrderLines) in the repository, then you have repository that creates an aggregate that is incorrect. (You have created an Order without any OrderLines when it had OrderLines)

Now I would interpret your question this way: "How to code the repositories without bugs and ensure bug is not introduced by mistake?". My suggestion would be to unit test the repository. Depending on the database(SqlSever, PGSql) you are using, you can use nuget packages for creating mock database servers. Example https://www.nuget.org/packages/Postgres2Go/

Another point to consider is Your domain should not depend on repository at compile time, but you domain will depend on repositories at run time. So if you have a bug in your repository, your domain object might not be produced correctly as in your case.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange