Question

Using Mongoid, how would you query for objects that don't include specific attributes?

Specifically, I am looking for all Course objects where Course.prerequisite_courses does not include any courses not in Student.courses_taken

For example:

Class Student
  include Mongoid::Document

  field :courses_taken,    type: Array # an array of course IDs
end

Class Course
  include Mongoid::Document

  field :prerequisites,    type: Array # an array of course IDs
end

student_1.courses_taken = [a, b]

course_1.prerequisites = [a]

course_2.prerequisites = [a, b]

course_3.prerequisites = [a, c]

So that student_1 would be admitted to course_1 and course_2 but not course_3

The two objects are unrelated

Note that in this case, there could be hundreds of course.prerequisites and student.courses_taken, and that I intend to have this be just one of several chained methods on my query.

Is there an elegant (or at least relatively inexpensive) way to do this with a mongoid query?

Was it helpful?

Solution

I do have a general preference for the moped query form as is works at a bit of a lower level and allows you to take advantage of the full feature-set of MongoDB query operators. It may not seem very "railsy" to some, but there are advantages. Particularly when the solution involves the use of .aggregate()

So in order to find the courses that match the prerequisites for the courses a student has taken you would build up a statement like this:

Course.collection.aggregate([
    // Filters the documents, not an exact match but a start
    { "$match" => { 
        "prerequisites" => { "$in" => [ "a", "b" ] },
    }},

    // Unwind the array
    { "$unwind" => "$prerequisites" },

    // Tag only the matching entries
    { "$project" => {
        "prerequisites" => 1,
        "matching" => { "$or" => [
            { "$eq" => [ "$prerequisites", "a" ] },
            { "$eq" => [ "$prerequisites", "b" ] },
        ]}
    }},

    // Group back to the course _id
    { "$group" => {
        "_id" => "$_id",
        "prerequisites" => { "$push" => "$prerequisites" },
        "matching" => { "$min" => "$matching" }
    }},

    // Match only the true values (all prerequisites met )
    { "$match" => { "matching" => true } },

    // Project only the wanted fields
    { "$project" => { "prerequisites" => 1 } }
])

So each element of the "courses_taken" is added to the $in operator so only the courses that contain something that is there will match initially. But this of course does not completely filter the condition that the student must meet all of the prerequisite courses, the point here is to reduce the number of documents to the ones that are a possible match.

After the array is unwound, then each value can be compared. This is what the $project is doing by building a statement from the array elements in order to test if that value is found or not. So under that $or condition, anything that is not matched would return false as this value.

In the later $group stage, as the documents are put back into their original form the $min value of that "matching" test is stored against the document. That means if any element of the prerequisites array was considered a false match, then the value for the whole document would be considered false.

The next $match is used to filter out any of the courses that would therefore contain a course prerequisite that did not match the courses taken by the student that was used for the input. So now you are left with only the courses that can be taken and the final $project simply removes the "matching" field (by omission) so the documents are now in their original form.

If you actually have MongoDB version 2.6 (just released as of writing) or upwards, then there are new operators for aggregation that make the statement much simpler:

Course.collection.aggregate([
    { "$match" => { 
        "prerequisites" => { "$in" => [ "a", "b" ] }
    }},
    { "$project" => {
        "prerequisites" => 1,
        "diff" => { "$size" => {"$setDifference" => [ 
            "$prerequisites", 
            [ "a", "b" ] 
        ]}}
    }},
    { "$match" => { "diff" => 0  } },
    { "$project" => { "prerequisites" => 1 } }
])

So this makes use of the new operators for $setDifference which can directly compare the arrays to find elements that are not in the set, along with the use of $size that will return the length of the tested array. Since any course that contains prerequsite elements that are not in the courses taken array for the student will return those elements as a result of the *$setDifference then any result with a "size" other than 0 can be excluded from the overall result.

Aside from being a lot simpler and having some speed advantages, you also avoid the complexity in generation by being able to pass the courses array from the student directly into construction of the pipeline query, and do not have to mess around with constructing the "equality" testing statement used in the first example.

But that gives you are fairly powerful way to do this sort of matching without resorting to looping results in code. It also points out that usage of the aggregation framework is not just for grouping results, but a very powerful query tool.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top