Pergunta

I'm trying to sort a collection of documents based on a field which is a field of a subdocument's subdocument.

Here's a very simplified version of my documents:

{
  "_id": ObjectId("536900cdb4f805efff8b075b"),
  "name": "el1",
  "versions": [{
    "releases": [{
      "rd": ISODate("2064-05-05T15:36:10.098Z")
    }, {
      "rd": ISODate("2014-05-01T16:00:00Z")
    }]
  }, {
    "releases": [{
      "rd": ISODate("2064-05-04T15:36:10.098Z")
    }, {
      "rd": ISODate("2014-05-01T14:00:00Z")
    }]
  }]
}, {
  "_id": ObjectId("536900f2b4f805efff8b075c"),
  "name": "el2",
  "versions": [{
    "releases": [{
      "rd": ISODate("2064-05-05T15:36:10.098Z")
    }, {
      "rd": ISODate("2014-05-01T17:00:00Z")
    }]
  }]
}

As you can see each document might have subdocuments named version and each version might have multiple subdocuments named release. I'd like to sort the main documents based on the rd field while excluding from the sort calculation all dates which are greater than a year from now. I don't care about sorting the subdocuments within the main document.

i.e. ISODate("2064-05-05T15:36:10.098Z") should be ignored because too far away while ISODate("2014-05-01T16:00:00Z") is good. By "ignore" I mean: don't use that value in the sort calculation, and not: remove that document from the result.

I have tried several ways including map-reduce and aggregation framework but failed miserably.

This should be the output of a successful sort:

{
  "_id": ObjectId("536900f2b4f805efff8b075c"),
  "name": "el2",
  "versions": [{
    "releases": [{
      "rd": ISODate("2064-05-05T15:36:10.098Z")
    }, {
      "rd": ISODate("2014-05-01T17:00:00Z")
    }]
  }]
}, {
  "_id": ObjectId("536900cdb4f805efff8b075b"),
  "name": "el1",
  "versions": [{
    "releases": [{
      "rd": ISODate("2064-05-05T15:36:10.098Z")
    }, {
      "rd": ISODate("2014-05-01T16:00:00Z")
    }]
  }, {
    "releases": [{
      "rd": ISODate("2064-05-04T15:36:10.098Z")
    }, {
      "rd": ISODate("2014-05-01T14:00:00Z")
    }]
  }]
}
Foi útil?

Solução

Please find two solutions to your question in the test case below. The first solution uses the MongoDB aggregation framework. For each document, sort keys are projected out of the rd values based on your time limit. The nested sort-key structure is reduced by unwinding twice followed by grouping for the max sort key. After sorting the docs, the last 'project' stage removes the sort keys. The second solution sorts on the client side. For efficiency, it processes each doc to determine the sort key and merges it in. After sorting the docs, it deletes the sort key out of each doc. You can eliminate the removal of the sort keys if their presence is tolerable.

A major strength of MongoDB is that documents map well to programming language data structures. So I would recommend using Ruby for a first attempt at a solution before looking for a DB solution. Note that in the Ruby solution, the rd_sort_key method while straightforward is non-trivial, suggesting that what you attempting with a condition and nested arrays is fairly complex, even without trying to do it in MongoDB's aggregation framework.

The client-side solution is OK if you are fetching the whole result set without a limit. If you use a limit, the server-side solution may save you some transfer time. But as always, you should measure and compare.

I hope that this helps, and that it is interesting and perhaps illuminating.

test.rb

require 'mongo'
require 'date'
require 'test/unit'

def iso_date_to_time(s)
  DateTime.parse(s).to_time
end

class MyTest < Test::Unit::TestCase
  def setup
    @pipeline = [
        {'$project' => {
            'name' => '$name',
            'versions' => '$versions',
            'rd_sort_key' => {
                '$map' => {
                    'input' => '$versions', 'as' => 'version', 'in' => {
                        '$map' => {
                            'input' => '$$version.releases', 'as' => 'release', 'in' => {
                                '$cond' => [
                                    {'$lt' => ['$$release.rd', @year_from_now]},
                                    '$$release.rd',
                                    nil
                                ]}}}}}}},
        {'$unwind' => '$rd_sort_key'},
        {'$unwind' => '$rd_sort_key'},
        {'$group' => {
            '_id' => '$_id',
            'name' => {'$first' => '$name'},
            'versions' => {'$first' => '$versions'},
            'rd_sort_key' => {'$max' => '$rd_sort_key'}}},
        {'$sort' => {'rd_sort_key' => -1}},
        {'$project' => {
            '_id' => '$_id',
            'name' => '$name',
            'versions' => '$versions'}}
    ]
    @coll = Mongo::MongoClient.new['test']['events_h']
    @docs = [
        {"_id" => BSON::ObjectId("536900cdb4f805efff8b075b"),
         "name" => "el1",
         "versions" => [{"releases" => [{"rd" => iso_date_to_time("2064-05-05T15:36:10.098Z")},
                                        {"rd" => iso_date_to_time("2014-05-01T16:00:00Z")}]},
                        {"releases" => [{"rd" => iso_date_to_time("2064-05-04T15:36:10.098Z")},
                                        {"rd" => iso_date_to_time("2014-05-01T14:00:00Z")}]}]
        },
        {"_id" => BSON::ObjectId("536900f2b4f805efff8b075c"),
         "name" => "el2",
         "versions" => [{"releases" => [{"rd" => iso_date_to_time("2064-05-05T15:36:10.098Z")},
                                        {"rd" => iso_date_to_time("2014-05-01T17:00:00Z")}]}]
        }]
    @expected_names = [@docs.last['name'], @docs.first['name']]
    @coll.remove
    @coll.insert(@docs)
    @year_from_now = Time.now + 60*60*24*365
  end

  test "aggregation sort with map and conditional" do
    result = @coll.aggregate(@pipeline)
    assert_equal(@expected_names, result.collect{|doc| doc['name']})
  end

  def rd_sort_key(doc, future_time_limit)
    sort_key = nil
    doc['versions'].each do |version|
      version['releases'].each do |release|
        rd = release['rd']
        sort_key = sort_key ? [sort_key, rd].max : rd if rd < future_time_limit
      end
    end
    sort_key
  end

  test "client sort with conditional" do
    result = @coll.find.to_a
    result.each{|doc| doc['rd_sort_key'] = rd_sort_key(doc, @year_from_now)}
    result = result.sort{|a, b| b['rd_sort_key'] ? b['rd_sort_key'] <=> a['rd_sort_key'] : -1}
    result.each{|doc| doc.delete('rd_sort_key')}
    assert_equal(@expected_names, result.collect{|doc| doc['name']})
  end
end

$ ruby test.rb

Loaded suite test
Started
..

Finished in 0.008794 seconds.

2 tests, 2 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
100% passed

227.43 tests/s, 227.43 assertions/s
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top