Which layer do DDD Repositories belong to?

https://softwareengineering.stackexchange.com/questions/396151

01-03-2021
|

سؤال

In his DDD book Evans promotes the idea of layered architecture, and in particular that the business logic should be confined to domain layer and separated from UI/persistence/other concerns. He also introduces Repository pattern as a mean to abstract access and persistent storage to Entities and Value Objects. To me the following left unclear:

Which layer Repositories belong: the Domain Layer, Persistence Layer or something in the middle? (It seems that if it were below Domain Layer it would violate Layered Architecture principle, because it depends on a domain object which it stores)
Can Entities, Value Objects or Domain Services call Repositories?
Should Repositories be abstracted from storage technology (which would be implied if they belong to domain layer) or can they leverage those storage technologies?
Can Repositories contain business logic?
Have you applied these constraints in practice and what was the effect on the quality of your project?

(I am mostly interested in DDD perspective)

المحلول

Repositories and their placement in the code structure is a matter of intense debate in DDD circles. It is also a matter of preference, and often a decision taken based on the specific abilities of your framework and ORM.

The issue is also muddied when you consider other design philosophies like Clean Architecture, which advocate using an abstract repository in the domain layer while providing concrete implementations in the infrastructure layer.

But here's what I have discovered, and what has worked for me, after trying out different permutation/combinations.

From a DDD perspective, Repositories sit between Application Services and Domain Objects.
Domain Objects encapsulate behavior and contain the bulk of business logic, enforcing invariants at the aggregate level.
Application services receive calls from UI/API/Controllers/Channels (external facing), initial repositories to load aggregates (if needed), invoke domain model for the necessary changes, then use the repositories again to persist aggregates

To your questions:

Which layer Repositories belong: the Domain Layer, Persistence Layer or something in the middle?

I would say there are three distinct layers in DDD applications - the inner domain layer, the outer application layer, and the external world (includes the API/UI).

The domain layer contains aggregates, entities, value objects, domain services, and domain events. These elements are only dependent on each other and actively avoid dependencies any outer layers.

The application layer contains Application Services, Repositories, Message Brokers, and whatever else you need to make your application practically possible. This layer is where most of the persistence, authorization, pub-sub processing, etc. happens. Application Layer depends and knows about the domain layer elements, but follows DDD guidelines when accessing the domain model. For example, it will only invoke methods on Aggregates and Domain Services.

The outermost layer includes API Controllers, serializers, authentication, logging, etc., whatever is not related to business logic or your domain, but very much part of your application.

Can Entities, Value Objects or Domain Services call Repositories?

No. The domain layer should preferably remain agnostic to repositories. Application services should take on the responsibility of transactions and repository interactions.

Should Repositories be abstracted from storage technology (which would be implied if they belong to domain layer) or can they leverage those storage technologies?

Repositories lean towards the domain side, meaning they contain methods that are meaningful from a domain point of view (like GetAdults() or GetMinors()). But the concrete implementation can be done in a couple of ways:
- You could use an abstract repository to declare the necessary methods, and then create concrete implementations for different databases. The database implementation can be chosen at the beginning of application startup, based on your configurations. Note that even in this case, Domain layer has nothing to do with repositories
- Repositories could act like wrappers and make use of underlying DAO objects (one per table/document) that implement the actual logic for interacting with the database. The DAO objects are initialized usually with dependency injection if your framework/language supports it, or they could be initialized manually based on active configuration.

Can Repositories contain business logic?

Repositories represent domain concepts, with meaningful method names, but seldom contain any business logic. They encapsulate the database query and give it a conceptual name that usually is derived directly from the ubiquitous language. It is so much better to have a method called GetAdults() instead of .filter(age > 21).

Have you applied these constraints in practice and what was the effect on the quality of your project?

If you restrict yourself to using repositories only in application services, and control transactions at one place (usually with Unit of Work pattern), Repositories are pretty easy to work with. In my past projects, I have found it extremely useful to restrict all database interaction to repositories instead of sprinkling lifecycle methods in the domain layer.

When I called lifecycle methods (like save, update, etc.) from the aggregate layer, I found it to be extremely complex and difficult to reliably control ACID transactions.

نصائح أخرى

Which layer Repositories belong: the Domain Layer, Persistence Layer or something in the middle?

If you believe in dependency inversion^*, then repositories are contracts defined by the application layer and implemented by the persistence layer. The contract itself has an implicit dependency on the domain model (via the appearance of the domain entities in the signature), and the implementation of a repository will often include a direct dependency on the factory responsible for creating instances.

Can Entities, Value Objects or Domain Services call Repositories?

Ideally, they should not need to. The domain model sits at the core of the onion; it's a pure in memory representation of (a part of the) domain.

You may sometimes see a domain service that acts as a facade provides a read-only view of cached data. But writing to a repository from the domain model is normally out of bounds (the dependency arrows don't point that direction).

Should Repositories be abstracted from storage technology

Ideally, the repository contract explicitly describes the capabilities required by the application, without necessarily coupling you to a specific technology.

Can Repositories contain business logic?

In its ideal form, a repository is a facade for a domain agnostic collection. So no. This is somewhat connected to the previous question; if the implementation of the repository includes important business logic, it becomes that much harder to find the logic, and to recognize the implications that logic has on your choice of storage.

A real portion of the motivation for these patterns is an attempt to make them easier to maintain -- which includes making it easier to guess where in the code base some important concept is going to be realized.

Have you applied these constraints in practice and what was the effect on the quality of your project?

This is pretty subjective, but I haven't found much joy in the repository pattern as described by Evans. Part of the problem is that the patterns are very much based in the Kingdom of Nouns; another is that they appear to be coupled to the idioms of Java circa 2003.

The idea is right -- the application shouldn't need to know the details of where mutable state is managed. We can correctly place a boundary between those concepts. Similarly, the domain model should be independent from its application host, so that the host can be changed.

Where things tend to break down is save. How do you get the data you need out of the object and into your storage? The domain model has it, the repository needs it, so you end up with some persistence pollution entering your "pure" domain model. The pollution can take a number of forms -- ORM hooks, public members, serialization APIs....

Let's assume that Entities have to be immutable. By definition repository is an emulation of in-memory collection of entities, sort of Map

Very close. It's probably a better fit, semantically, if you think in terms of messages, rather than entities. Both the domain model and persistence need to know how to extract information from the messages to get their job done. See Boundaries, by Gary Bernhardt

Alternatively, you can keep the idea of the repository that stores mutable entities (somewhere), by separating those entities from the business logic. In that approach, the entity API is consumed by the domain model, but implemented by the persistence component. This keeps the mutable state out at the boundary of your solution, where it belongs.

*_{The term "Dependency Inversion" is as ill-defined as all of the other SOLID principles, and articles on the internet all parrot the same tired phrases to define it. For purposes of this answer, dependency inversion can be defined as "handing a class its concrete dependencies through constructor parameters that are defined by interfaces." See here for an example.}

So, if you disregard all the "aggregate design rules" talk you can find on the net, and focus on how aggregates are described in the book, they are meant to be these bundles of objects that form some sort of a graph, with an explicit design boundary, and one object selected as the root. The root acts as an interface (facade) to the aggregate, it encapsulates it, and is responsible for maintaining consistency within that boundary. Your domain model is then, more or less, composed of a number such aggregates that can be reached by traversal. Another important role the aggregates have is to simplify the web of dependencies between objects, so aggregate relationships should be strategically designed.

The question is, how do you obtain the initial aggregates - the ones you start the traversal from?

If you take a closer look at how Evans defined repositories, the emphasis is not on persistence (although generally speaking they do eventually end up going to the database) - instead they are a means to obtain the initial "surface level" aggregate roots, letting you start there and reach the others via traversal.

To quote the Blue Book:

"Whether to provide a traversal or depend on a [database] search becomes a design decision, trading off the decoupling of the search against the cohesiveness of the association. [...] The right combination of search and association makes the design comprehensible."

That means that the dependency structure is such that (1) when an execution path is entered, a Use Case¹ that needs an aggregate must ask a repository to obtain one, and (2) the the repository knows about certain "surface level" aggregates, but the core domain doesn't know about repositories.

There is some room for variation when it comes to how you organize that into layers, but you can see that the above structure imposes some constraints on the dependency directions across and within layers. So to answer you question.

Which layer Repositories belong: the Domain Layer, Persistence Layer or something in the middle?

A repository as conceptualized by DDD is an abstraction - it is not itself in the persistence layer, it encapsulates database access - meaning that it forwards reconstitution request to some data access gateway that is in the persistence layer (either via composition, or via dynamic dispatch).

"For each type of object that needs global access [...] set up access through a well-known global interface. [Repositories encapsulate] the actual storage and query technology. Provide REPOSITORIES only for AGGREGATE roots that actually need direct access."

They are meant to be accessible throughout the application layer, so that seems to be a natural place for them to reside, but the actual implementation will go across layers. The reason for the encapsulation is to keep application service layer code simple and decoupled from persistence-related boilerplate.

(It seems that if it were below Domain Layer it would violate Layered Architecture principle, because it depends on a domain object which it stores)

The repository abstraction is not below the Domain Layer, but it does depend on the domain objects - that's the whole point. You ask it for an aggregate root using the language of the domain model (e.g., you pass it an ID), it then figures out how to turn that into a database query; the database returns its representation, and the repository then uses it to reconstitute the requested aggregate, and returns that. Beyond that, it doesn't contain any significant business logic, and it certainly doesn't contain any problem-specific (core) domain logic. In practice, though, you want to try and arrange things in a way that avoids elaborate and costly translations between representations.

Another thing that is discussed in the book is that it is desirable to find a way to express aggregate boundaries in the code itself, but that it can be a bit tricky to do so. The most common thing you'll find on the web is the idea popularized by Vaughn Vernon, that states that aggregates should reference other aggregates exclusively via their identity. This can work well in many cases, and has the advantage of clearly delineating aggregate boundaries in code, but IMO it's too restrictive and somewhat database-centric in thinking for it to be elevated to the status of a rule that you enforce or even default to when it comes to aggregate design. So don't think of it as the only option - you could design a different scheme.

If you do go down that path, then you have to figure out how to traverse across aggregates given that specific restriction. Vaughn Vernon recommends to "use a repository or domain service to look up dependent objects ahead of invoking the aggregate behavior" (and not use a repository from within the aggregate, to prevent the aggregates from having a dependency on one or more repositories).

There's a bit of a tradeoff here (as with every design decision): A certain amount of knowledge about the inner workings of the aggregate is being pushed elsewhere, and the signatures of the aggregates' methods become more verbose (and some of the dependencies less obvious as they aren't visible in the constructor).

This is a viable approach, but it should be noted that it's a little different than what's described in Evans' book - where the idea was to be very deliberate about which aggregate roots should be obtained through repositories, and which via traversal, and under what circumstances - based on your understanding of the domain, business processes, access patterns, etc.

¹ _{The two layers of interest here, in DDD terminology, are the Application Layer and the Domain Layer. The Application Layer ("defines the jobs the software is supposed to do and directs the expressive domain objects to work out problems") roughly corresponds to the Application Business Rules (Use Cases) layer in Clean Architecture ("use cases [...] direct [the] entities to use their enterprise wide business rules to achieve the goals of the use case").}

The repository interface belongs in the domain layer. The repository implementation should be elsewhere, but the literature is not clear. Options include: infrastructure layer (probably the most consonant with DDD precepts); application layer; persistence layer; repository layer or sub-layer.
No. That would be a violation of the DDD-driven layered architecture, and would likely create a cyclic dependency.
The recommended practice today is to define a repository interface that is abstracted from the persistence technology. The repo impl OTOH is not. Evidently, application services should use the repo interface.
Repositories should not contain business logic. However, they may contain query logic. Some may see query logic as business logic, especially if it's a complex query. If you have complex repo queries, maybe your aggregate is larger than it should or maybe you're overusing the query language power. Using repositories for reporting needs is an antipattern.
In our organization we place repository interfaces alongside entities in the domain layer. App services directly use entities, VOs and repositories (via the interface). We use the JpaRepository interface. With respect to answer (1), this alternative has the benefit that you don't need to provide the repository implementation--it's automagically given to you by the Spring framework.

My answers are primarily influenced by Patterns, Principles, and Practices of DDD by Millet and Tune.

According to dddsample you can have a Repository as interface(CargoRepository) in domain layer and actual implementation(CargoRepositoryHibernate) in Infrastructure layer.

Layered Architecture picture (also from DDD blue book):

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange