Does it make sense to use the repository pattern with a document database?

https://stackoverflow.com/questions/9255315

29-04-2021
|

Question

I'm currently experimenting with MongoDB. I'm moving from a NHibernate/SQL mindset, and so initially I implemented a repository pattern for data access.

This was all looking fine until I started using nested documents. Now it's starting to seem like there's a bit of a mismatch. However, I'm comfortable with repositories, and like the abstraction, separation of concerns, and testability they provide.

Are people successfully using the repository pattern with document databases? If not, what data access methodology to you use? What about abstraction/SoC?

Solution

It's an interesting question. In my usage of MongoDB, I chose to not have a repository. This was primary because the document database was used as a read store (therefore simplifying the data that was stored there).

I guess you have to strip the consideration back to what a repository is and what advantages you get from the extra layer of abstraction. A good design will have the least number of layers possible.

A repository therefore gives you some persistence ignorance and the ability to use unit of work over a common context of data. It also can increase your ability to test queries against the data in isolation (because these are usually abstracted as queryables or specifications).

Also, some document databases already provide a repository pattern (RavenDB etc), so there is no need to have yet another layer.

So, it seems to me that using a repository is not so much about whether your data is stored as a relational table or a document, but more about what you gain from the abstraction.

OTHER TIPS

I don't know if this will help you but I was listening to a pod-cast a while back with Ayende Rahien talking about RavenDB. He was suggesting that a document actually maps fairly well to an aggregate in the DDD philosophy. If you're using nested documents to represent an aggregate and running into design issues perhaps nesting is not the best way to go?

Thinking about "Should I use X or not" is not as productive as focusing on "What should I use"?

What are the alternatives to the repository pattern, what are the tradeoffs, and how do they relate to your domain and implementation?

Repositories are good for enforcing a predefined set of patterns over a general-purpose store (such as SQL). For a document store, my impression is that the document schema will determine the access patterns to a greater extend than you would typically see in a SQL based store. Implementing a repository in this case may lead to very leaky abstractions, where changes to the underlying document structure have a 1:1 impact on the relevant business code. In that case the repository provides very little value. To me document stores naturally lend themselves well to Unit-of-Work (UoW) paradigms where the unit of work is a document (or doc+nested subdocs, or sets of documents).

Another strength of the repository pattern is, as you mentioned, abstraction over the storage mechanism. The tradeoff is usually loss of access to low-level implementation-specific features of Mongo. Is that a worthwhile tradeoff for you? NHibernate is very tightly coupled to SQL, and hence has richly functional abstractions over all the important features of a RDBMS. I'm not aware of any similar framework for Mongo so you would really be raising the level of abstraction quite a bit.

Do you need to support multiple concurrent data stores? For example, will you be writing some types of data to SQL and others to Mongo, through the same data layer abstraction? If so then a repository is a good option.

If you can provide some more details of your domain and implementation then we can drill down some more into the specific tradeoffs which you may want to consider

I use MongoDB in production code with the Repository Pattern for over 2 years now and I can say that it really helped me over time. The abstractions serve well for testing (in-memory) and production (MongoDB).

I use Java and the code looks something like this (roughly):

public interface MyStorage {

    boolean add(MyDoc doc);

    boolean update(MyDoc doc);

    boolean remove(String docId);

    boolean commit();

    MyDoc get(String docId);

    MyStorageQuery newQuery();

    List<MyDoc> query(MyStorageQuery q);

}

I have a factory for each storage implementation which creates new instances of the MyDoc object. I interchange the implementation between MongoDb and my own hand-rolled mock for testing.

The MongoDB implementation uses a MyDoc class which extends the BasicDBObject like so:

public interface MyDoc {
  Data getData(); // let's assume this is a nested doc
  void setData(Data d);
  String getId();
  long getTimestamp();
  void setTimestamp(long time);
}

MongoDbMyDoc extends BasicDBObject implements MyDoc {
  MongoDbObject() { }
  void setId(String id) {
    this.put("_id", id);
  }
  String getId() {
    return super.get("_id");
  }
  void setData(Data d) {
    dataObj = new BasicDBObject();
    dataObj.put("someField",d.someField);
    dataObj.put("someField2", d.someField2);
    super.put("data",dataObj);
  }
  ...
}

I then in the actual storage implementation use the MongoDB Java client to return instances of my implementation from the DB. Here is the constructor for my MongoDB storage implementation:

public MongoDbMyStorage(DB db, String collection) {
  //DB in a mongodb object (from the client library) which was instantiated elsewhere
  dbCollection = db.getCollection(collection);
  dbCollection.setObjectClass(MongoDbMyDoc.class);
  this.factory = new MongoDbMyDocFactory();
}

There are 2 more interfaces here: MyStorageQuery which is also implemented as a BasicDBObject for the MongoDB implementation and generated using the newQuery() of the storage interface. And MyDocFactory which isn't presented here, but it is basically a document factory that knows what the storage implementation is and generates the MyDoc instances accordingly.

Caveats:
One thing where the abstraction doesn't make much sense is in defining the indexes used by the MongoDB storage. I put all my ensureIndex(...) calls in the constructor, not very generic, but defining indexes per collection is a MongoDB specific optimization so I can live with it.
Another is that commit is implemented using the getLastError() command which from my experience didn't work so well. It isn't a problem for me since I almost never explicitly commit a change.

Eric Evens, in his Domain Driven Design book, has a very complex and very good explanation of the repository pattern. His definition is what a repository should be and how it should be used (in my personal opinion). You can find a short description here: Eric Evans on Repositories

Basically, if you keep you repositories just an intermediary between client code and factories, they will be perfect for what I understand you need. The repositories should offer the query/construction/validation interfaces and do all the data acquisition staff (like connect / query database) and than you should have one or more (as needed) factories which will build the objects and pass it back to the client via the repository.

It makes more sense to use Repository pattern with NoSQl databases than RDS databases because in proper DDD, you need one repository per aggregate root, but dute the limitations of an RDS datastore, devs usually create one repository object per entity/table. NoSQL databases allow you to implement DDD and Repository pattern properly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow