What data should a repository return

https://softwareengineering.stackexchange.com/questions/376447

07-02-2021
|

Question

I have a simple project where the controller calls the service, and the service calls the repository in order to get the needed data.

Assuming that we have these domain models:

// this model has a RepositoryA built for it
Class A {
   public string Name { get; set; }
   public int B_Id { get; set; }
}

// this model has a RepositoryB built for it
Class B {
   public string Age{ get; set; }
}

And this DTO:

Class MyDto {
   public string Name { get; set; }
   public string Age{ get; set; }
}

And knowing (correct me if I'm wrong) that a repository shouldn't return a DTO, and that the query must be done in a RepositoryA where we have to select both models A & B in order to build a return that contains a mix of properties (complex request with multiple joins on different tables in the real project), what model should the repository return if not the DTO ?? knowing that the DTO is already prepared to receive the mix of properties?

Solution

And knowing (correct me if I'm wrong) that a repository shouldn't return a DTO

Theoretically, every layer (= project in your solution) should have its own DTO objects. In that sense, your repositories should return a DTO, but this is not the same DTO as the "business logic DTO".

However, in reality, we don't need that much separation. The benefits do not outweigh the effort. In most cases, it suffices to have a single entity-DTO conversion, which tends to happen in the business layer (BLL), which is one layer above the repository layer (DAL).
In that sense, your repositories should not return a DTO. But your business classes should.

and that the query must be done in a RepositoryA where we have to select both models A & B in order to build a return that contains a mix of properties (complex request with multiple joins on different tables in the real project)

The issue here is one of realism. The theory simply can't be put into practice.

Theoretically, repositories are designed to be entity-type-specific data providers. In other words, you get A from the ARepository and B from the BRepository.

However, when dealing with external storage resources (database server), the effort needed to retrieve data is non-negligible. This creates an issue for us. For every entity type we wish to retrieve, we need a separate call via its own repository.

Sidenote: If you're either not dealing with an external data resource or you're not trying to run single queries which will return objects of multiple types, the continuation of this answer is irrelevant to you, as you'll be perfectly happy with getting each type from its own repository.

Once you're dealing with more than one entity type in a request, separating these into separate retrieval queries is counterproductive. While it does create a clean code structure, it dramatically impacts performance. This becomes doubly egregious when you realize that a SQL database server is specifically optimized for joining (by using indexes), but the code that calls the database (repository) is somehow incapable of implementing that same graceful data mixing approach.

Using repositories has becoming an antipattern here. The obstacles you encounter are caused by us deciding to use repositories. And a "solution" that creates a bigger obstacle than it aims to solve is not a solution.

A unit of work solves a lot of the problems that repositories introduce in terms of transational safety (having all repositories use the same context.

However, a unit of work does not help with deciding where to write a complex data query (ARepository? BRepository? ...)

This is a tough choice. In a way, repositories are really nice, especially in one-entity-type contexts (e.g. simple CRUD functionality). On the other hand, it massively complicates complex data retrieval.

I haven't encountered a universally agreed upon solution for this. But I have worked on several project where one (or more) agreements were struck to at least keep the location of the code (especially for complex data queries) somewhat sensical.

It's always good to implement a unit of work.

This is just a list of agreements I've encountered over several projects I've worked at. I can't really put one over the other

Complex data queries are put into the repository of their main entity type.
- A query which retrieves a list of cars and includes their owners belongs to the CarRepository.
- A query which retrieves a list of people and includes the cars they own belongs to the PersonRepository.
- Pro You can keep the old repository structure but at least make it less of a guessing game where to put the code.
- Con There are fringe cases where there is no clear "main" entity type.
Complex data queries are put into their own repository. "Old" repositories that focus on a single entity type should only be used in CRUD operations.
- If you have a method that saves A and B objects, you should have an ABRepository which internally uses ARepository and BRepository (a unit of work is incredibly important here!).
- Pro it separates the CRUD logic from the reporting logic.
- Con If you have many combinations (AB, AC, ABC, ACD, ...) the list of repositories is going to grow out of bounds.
- The suggestion to prevent the "con" is to name these repositories after their function (YearlyReportsRepository) and not just their aggregate list of entity types (PersonCarRepository).

what model should the repository return if not the DTO ??

If you choose option 1, that question has an easy answer.

A query which retrieves a list of cars and includes their owners belongs to the CarRepository.

In other words, it returns a IEnumerable<Car> and therefore still follows the idea that "a CarRepository returns cars" (and possibly some related entities, but they are not explicitly part of the return type).

A query which retrieves a list of people and includes the cars they own belongs to the PersonRepository.

In other words, it returns a IEnumerable<Person> and therefore still follows the idea that "a PersonRepository returns people" (and possibly some related entities, but they are not explicitly part of the return type).

If you choose option 2, then you're implicitly arguing that a complex data report is something different from a simple CRUD operation. This means that you'd probably end up returning custom classes (CarOwnerResult) from the custom repositories (= not bound to a single entity type).

Note that, if you want to, simple repositories (bound to a single entity type) can still be expected to return only their bound entity type, just like before.

OTHER TIPS

The repository should return a Domain Model.

You should have a single repository per database, rather than one per table.

In your case if you want both domain models in as single query then you could either

Change your Domain Model so that A has a child object B
Have two separate methods still, but cache the results, so that the second method called, doesn't have to re-run the query.
Return a Tuple<A,B> or Dictionary<A,B>

Returning a Tuple is my least favorite of the options. If you end of doing that the you should probably be making a Complex object instead.

In some cases a Dictionary return can make sense, but again it's hard coding a relationship that you might not always want. The Danger is that you end up with a million methods GetAwithB, GetBwithC, GetXwithQandR etc etc

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange