representing a many-to-many relationship in couchDB

https://stackoverflow.com/questions/1822444

10-07-2019
|

Question

Let's say I'm writing a log analysis application. The main domain object would be a LogEntry. In addition. users of the application define a LogTopic which describes what log entries they are interested in. As the application receives log entries it adds them to couchDB, and also checks them against all the LogTopics in the system to see if they match the criteria in the topic. If it does then the system should record that the entry matches the topic. Thus, there is a many-to-many relationship between LogEntries and LogTopics.

If I were storing this in a RDBMS I would do something like:

CREATE TABLE Entry (
 id int,
 ...
)

CREATE TABLE Topic (
 id int,
 ...
)

CREATE TABLE TopicEntryMap (
 entry_id int,
 topic_id int
)

Using CouchDB I first tried having just two document types. I'd have a LogEntry type, looking something like this:

{
  'type': 'LogEntry',
  'severity': 'DEBUG',
  ...
}

and I'd have a LogTopic type, looking something like this:

{
  'type': 'LogTopic',
  'matching_entries': ['log_entry_1','log_entry_12','log_entry_34',....],
  ...
}

You can see that I represent the relationship by using a matching_entries field in each LogTopic documents to store a list of LogEntry document ids. This works fine up to a point, but I have issues when multiple clients are both attempting to add a matching entry to a topic. Both attempt optimistic updates, and one fails. The solution I'm using now is to essentially reproduce the RDBMS approach, and add a third document type, something like:

{
  'type':'LogTopicToLogEntryMap',
  'topic_id':'topic_12',
  'entry_id':'entry_15'
}

This works, and gets past the concurrent update issues, but I have two reservations:

I worry that I'm just using this approach because it's what I'd do in a relational DB. I wonder if there's a more couchDB-like (relaxful?) solution.
My views can no longer retrieve all the entries for a specific topic in one call. My previous solution allowed that (if I used the include_docs parameter).

Anyone have a better solution for me? Would it help if I also posted the views I'm using?

Solution

Your approach is fine. Using CouchDB doesn't mean you'll just abandon relational modeling. You will need need to run two queries but that's because this is a "join". SQL queries with joins are also slow but the SQL syntax lets you express the query in one statement.

In my few months of experience with CouchDB this is what I've discovered:

No schema, so designing the application models is fast and flexible
CRUD is there, so developing your application is fast and flexible
Goodbye SQL injection
What would be a SQL join takes a little bit more work in CouchDB

Depending on your needs I've found that couchdb-lucene is also useful for building more complex queries.

OTHER TIPS

I cross-posted this question to the couchdb users mailing list and Nathan Stott pointed me to a very helpful blog post by Christopher Lenz

I'd try setting up the relation so that LogEntrys know to which LogTopics they belong. That way, inserting a LogEntry won't produce conflicts as the LogTopics won't need to be changed.

Then, a simple map function would emit the LogEntry once for each LogTopic it belongs to, essentially building up your TopicEntryMap on the fly:

"map": function (doc) {
    doc.topics.map(function (topic) {
        emit(topic, doc);
    });
}

This way, querying the view with a ?key=<topic> argument will give you all the entries that belong to a topic.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow