Question

What is the best way to propagate updates when you have a denormalized Schema? Should it be all done in the same function?

I have a schema like so:

var Authors = new Schema({
    ...
    name: {type: String, required:true},
    period: {type: Schema.Types.ObjectId, ref:'Periods'},
    quotes: [{type: Schema.Types.ObjectId, ref: 'Quotes'}]
    active: Boolean,
    ...
})

Then:

var Periods = new Schema({
    ...
    name: {type: String, required:true},
    authors: [{type: Schema.Types.ObjectId, ref:'Authors'}],
    active: Boolean,
    ...
})

Now say I want to denormalize Authors, since the period field will always just use the name of the period (which is unique, there can't be two periods with the same name). Say then that I turn my schema into this:

var Authors = new Schema({
        ...
        name: {type: String, required:true},
        period: String, //no longer a ref
        active: Boolean,
    ...
})

Now Mongoose doesn't know anymore that the period field is connected to the Period schema. So it's up to me to update the field when the name of a period changes. I created a service module that offers an interface like this:

exports.updatePeriod = function(id, changes) {...}

Within this function I go through the changes to update the period document that needs to be updated. So here's my question. Should I, then, update all authors within this method? Because then the method would have to know about the Author schema and any other schema that uses period, creating a lot of coupling between these entities. Is there a better way?

Perhaps I can emit an event that a period has been updated and all the schemas that have denormalized period references can observe it, is that a better solution? I'm not quite sure how to approach this issue.

Was it helpful?

Solution

Ok, while I wait for a better answer than my own, I will try to post what I have been doing so far.

Pre/Post Middleware

The first thing I tried was to use the pre/post middlewares to synchronize documents that referenced each other. (For instance, if you have Author and Quote, and an Author has an array of the type: quotes: [{type: Schema.Types.ObjectId, ref:'Quotes'}], then whenever a Quote is deleted, you'd have to remove its _id from the array. Or if the Author is removed, you may want all his quotes removed).

This approach has an important advantage: if you define each Schema in its own file, you can define the middleware there and have it all neatly organized. Whenever you look at the schema, right below you can see what it does, how its changes affect other entities, etc:

var Quote = new Schema({
    //fields in schema
})
//its quite clear what happens when you remove an entity
Quote.pre('remove', function(next) {
    Author.update(
        //remove quote from Author quotes array.
    )
})

The main disadvantage however is that these hooks are not executed when you call update or any Model static updating/removing functions. Rather you need to retrieve the document and then call save() or remove() on them.

Another smaller disadvantage is that Quote now needs to be aware of anyone that references it, so that it can update them whenever a Quote is updated or removed. So let's say that a Period has a list of quotes, and Author has a list of quotes as well, Quote will need to know about these two to update them.

The reason for this is that these functions send atomic queries to the database directly. While this is nice, I hate the inconsistency between using save() and Model.Update(...). Maybe somebody else or you in the future accidently use the static update functions and your middleware isn't triggered, giving you headaches that you struggle to get rid of.

NodeJS Event Mechanisms

What I am currently doing is not really optimal but it offers me enough benefits to actually outweight the cons (Or so I believe, if anyone cares to give me some feedback that'd be great). I created a service that wraps around a model, say AuthorService that extends events.EventEmitter and is a Constructor function that will look roughly like this:

function AuthorService() {
    var self = this

    this.create = function() {...}
    this.update = function() {
        ...
        self.emit('AuthorUpdated, before, after)
        ...
    }
}

util.inherits(AuthorService, events.EventEmitter)
module.exports = new AuthorService()

The advantages:

  • Any interested function can register to the Service events and be notified. That way, for instance, when a Quote is updated, the AuthorService can listen to it and update the Authors accordingly. (Note 1)
  • Quote doesn't need to be aware of all the documents that reference it, the Service simply triggers the QuoteUpdated event and all the documents that need to perform operations when this happens will do so.

Note 1: As long as this service is used whenever anyone needs to interact with mongoose.

The disadvantages:

  • Added boilerplate code, using a service instead of mongoose directly.
  • Now it isn't exactly obvious what functions get called when you trigger the event.
  • You decouple producer and consumer at the cost of legibility (since you just emit('EventName', args), it's not immediately obvious which Services are listening to this event)

Another disadvantage is that someone can retrieve a Model from the Service and call save(), in which the events won't be triggered though I'm sure this could be addressed with some kind of hybrid between these two solutions.

I am very open to suggestions in this field (which is why I posted this question in the first place).

OTHER TIPS

I'm gonna speak more from an architectural point of view than a coding point of view since when it comes right down to it, you can pretty-much achieve anything with enough lines of code.

As far as I've been able to understand, your main concern has been keeping consistency across your database, mainly removing documents when their references are removed and vice-versa.

So in this case, rather than wrapping the whole functionality in extra code I'd suggest going for atomic Actions, where an Action is a method you define yourself that performs a complete removal of an entity from the DB (both document and reference).

So for example when you wanna remove an author's quote, you do something like removing the Quote document from the DB and then removing the reference from the Author document.

This sort of architecture ensures that each of these Actions performs a single task and performs it well, without having to tap into events (emitting, consuming) or any other stuff. It's a self-contained method for performing its own unique task.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top