Domanda

I noticed that if I enter the value 'seasons' in a full text search enabled string field of some collection, then MongoDB finds this value when I query for 'season'. But if I enter something more complex like e.g. 'mice' or 'criteria', it does not find these values when I query for 'mouse' or 'criterion' respectively. Is that normal and are there any clear rules what MongoDB is able to stem and what not?

[test] 2014-03-30 18:25:09.551 >>> db.TestFullText7.find();
{
        "_id" : ObjectId("53389720063ab25d2d55c94c"),
        "dt" : ISODate("2014-03-30T22:13:52.717Z"),
        "title" : "mice",
        "txt" : "mice"
}
{
        "_id" : ObjectId("5338994c063ab25d2d55c94d"),
        "dt" : ISODate("2014-03-30T22:23:08.259Z"),
        "title" : "criteria",
        "txt" : "criteria"
}
{
        "_id" : ObjectId("533899c5063ab25d2d55c94e"),
        "dt" : ISODate("2014-03-30T22:25:09.551Z"),
        "title" : "seasons",
        "txt" : "seasons"
}
[test] 2014-03-30 18:25:13.295 >>> db.runCommand({"text" : "TestFullText7", "search" : "season"});
{
        "queryDebugString" : "season||||||",
        "language" : "english",
        "results" : [
                {
                        "score" : 2,
                        "obj" : {
                                "_id" : ObjectId("533899c5063ab25d2d55c94e"),
                                "dt" : ISODate("2014-03-30T22:25:09.551Z"),
                                "title" : "seasons",
                                "txt" : "seasons"
                        }
                }
        ],
        "stats" : {
                "nscanned" : 1,
                "nscannedObjects" : 0,
                "n" : 1,
                "nfound" : 1,
                "timeMicros" : 148
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:22.406 >>> db.runCommand({"text" : "TestFullText7", "search" : "mouse"});
{
        "queryDebugString" : "mous||||||",
        "language" : "english",
        "results" : [ ],
        "stats" : {
                "nscanned" : 0,
                "nscannedObjects" : 0,
                "n" : 0,
                "nfound" : 0,
                "timeMicros" : 110
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:30.986 >>> db.TestFullText7.getIndexes();
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "$**_text",
                "weights" : {
                        "$**" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 1
        }
]
[test] 2014-03-30 18:25:45.228 >>>
È stato utile?

Soluzione

MongoDB uses the Snowball stemming library. Unfortunately, this looks to be one of the limitations of this library.

You can see the pages for the english stemmer here. Compare the vocabulary + stemmed equivalent page and you can see "Mouse" becoming "Mous" and "Mice" still remaining "Mice".

You can see MongoDB's use of Snowball in their codebase here and here

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top