Question

I have an angular 8 application, with a Python + MongoDB API on the backend.

At present, I have 4 collections, namely: Users, Tasks, Companies and Groups.

All of these resource types are retrievable via the API on their own routes.

In all collections except from groups, the objects reference at least one object in another collection. For example:

Users

{
    'email': 'joe.bloggs@example.com',
    'firstname': 'Joe',
    'lastname': 'Bloggs',
    'groups': ['Administrators', 'Finance']
}

Groups

{
    'name': 'Administrators'
}

Tasks

{
    'creator': 'joe.bloggs@example.com',
    'title': 'example title',
    'description': 'Lorem Ipsum',
    'company': 'ABC Trading Ltd',
    'comments': [
        {
            'creator': 'joe.bloggs@example.com',
            'message': 'lorem ipsum',
            'created': '2018-09-12 08:14'
        }
    ]
}

Companies

{
    'name': 'ABC Trading Ltd',
    'address': '123 Example Street',
    'creator': 'joe.bloggs@example.com',
    'comments': [
        {
            'creator': 'joe.bloggs@example.com',
            'message': 'lorem ipsum',
            'created': '2018-09-12 08:14'
        }
    ]
    'events': [
        'user': 'joe.bloggs@example.com',
        'event': 'created',
        'created': '2019-09-12 05:14'
    ]
}

Now, let's take the "Task Detail" page for example. Which lists the information about the task. It calls the task endpoint and retrieves a given task.

All of the referenced objects need to be resolved as they're used in the front end. For example, the creator is displayed as "Creator: Joe Bloggs", and hovering over the user's name (in format of firstname lastname) needs to shows a little profile card with their email address and groups etc.

The company is also resolved, so that hovering over it's name shows the address of the company. Comments are also resolved so that the user object is included in the comments object.

Now this is becoming quite complicated, with massive mongo aggregate queries using $lookup. For example, this is the current query to retrieve a company from the database:

company = list(self.company_coll.aggregate([
            {'$match': {'_id': ObjectId(id)}},
            {'$unwind': '$comments'},
            {"$lookup": {
                "from": "users",
                "localField": "comments.user",
                "foreignField": "email",
                "as": "user"}
            },
            {'$addFields': {'comments.user': {"$arrayElemAt": ["$user", 0]}}},
            {'$project': {'user': 0}},
            {'$group': {'_id': '$_id', 'comments': {'$push': '$comments'}, 'data': {'$first': '$$ROOT'}}}, {'$addFields': {'data.comments': '$comments'}}, {'$replaceRoot': {'newRoot': '$data'}},
            # Resolve users on each event
            {'$unwind': '$events'},
            {"$lookup": {
                "from": "users",
                "localField": "events.user",
                "foreignField": "email",
                "as": "user"}
            },
            {'$addFields': {'events.user': {"$arrayElemAt": ["$user", 0]}}},
            {'$project': {'user': 0}},
            {'$group': {'_id': '$_id', 'events': {'$push': '$events'}, 'data': {'$first': '$$ROOT'}}}, {'$addFields': {'data.events': '$events'}}, {'$replaceRoot': {'newRoot': '$data'}}]))[0]

And there's still more data to resolve, such as the creator of the company, and as the application grows, these queries are going to become ridiculous. Another problem I'm facing is that, say for example we retrieve a company, and resolve the 'creator' attribute to populate the user object. This doesn't resolve the 'groups' attribute within the user object, so that would involve even more lookups and a bigger query.

The alternative option I can think of, is to simple retrieve the company using find_one({}) and then using python, get the fields I need to resolve, such as creator, and look up using a separate call to the database and get the user and inject it into the document after it's been retrieved, for example:

def get_company(self, name):
    company = self.db.companies.find_one({'name': name})
    self.__inject_users(company)
    return company        

def __inject_users(self, company):
    comments = company['comments']
    for comment in comments:
        user = comment['user']
        user_obj = self.user_service.get(email=user)
        comment['user'] = user_obj

    creator = company['creator']
    creator_obj = self.user_service.get(email=creator)
    company['creator'] = creator_obj

With the user service looking like this:

class UserService():
    def get(self, email):
          user = self.db.users.find({'email': email})
          self._inject_groups(user) # Same thing here, lookup groups in DB and populate.
          return user

I feel like this is the better option long term as it'll prevent the massive complex queries for basic information retrieval, but I'm not sure if this will cause performance issues due to the number of individual queries being completed to simply load one page in the system. The end result could end up being the following database calls to retrieve a task from the database with the resolved objects:

  1. The company itself
  2. The user who created the company in the system
  3. Each user who's commented.
  4. Each user who's triggered an event
  5. Each group of each user for all of the above

This could be hundreds of database calls to load one page, however I'm not sure which option is better, complex, hard to read code, or inefficient code that calls the database multiple times.

Is there a better way to go about this altogether?

Was it helpful?

Solution

Well, making inner joins of multiple objects is not what mongo is for. What you are trying to do is to force a SQL object model inside a NoSQL database.

You are left with the following options :

  • Denormalizing heavily your model. E.g. getting necessary details of a user inside each comment. Getting group details into every user. This have drawbacks such as, when you rename a user, you have a migration to run on every comment, but overall this would be using mongo the way it's meant to : no joins.
  • Having complex lookup queries and living them be. They will be sometimes necessary if there are parts you can't denormalize without being too heavy on migration.
  • Caching objects and joining inside a python data manager layer. This is bad, this is ugly, but this complement options 1 and 2 when there are frequently accessed objects that you need to render fast.
  • Migrate to a SQL database. Judging by your requirements it's possible mongo is the wrong technology. Migrating would probably fasten up query development and reduce overall database complexity.
Licensed under: CC-BY-SA with attribution
scroll top