I noticed that if I enter the value 'seasons' in a full text search enabled string field of some collection, then MongoDB finds this value when I query for 'season'. But if I enter something more complex like e.g. 'mice' or 'criteria', it does not find these values when I query for 'mouse' or 'criterion' respectively. Is that normal and are there any clear rules what MongoDB is able to stem and what not?

[test] 2014-03-30 18:25:09.551 >>> db.TestFullText7.find();
{
        "_id" : ObjectId("53389720063ab25d2d55c94c"),
        "dt" : ISODate("2014-03-30T22:13:52.717Z"),
        "title" : "mice",
        "txt" : "mice"
}
{
        "_id" : ObjectId("5338994c063ab25d2d55c94d"),
        "dt" : ISODate("2014-03-30T22:23:08.259Z"),
        "title" : "criteria",
        "txt" : "criteria"
}
{
        "_id" : ObjectId("533899c5063ab25d2d55c94e"),
        "dt" : ISODate("2014-03-30T22:25:09.551Z"),
        "title" : "seasons",
        "txt" : "seasons"
}
[test] 2014-03-30 18:25:13.295 >>> db.runCommand({"text" : "TestFullText7", "search" : "season"});
{
        "queryDebugString" : "season||||||",
        "language" : "english",
        "results" : [
                {
                        "score" : 2,
                        "obj" : {
                                "_id" : ObjectId("533899c5063ab25d2d55c94e"),
                                "dt" : ISODate("2014-03-30T22:25:09.551Z"),
                                "title" : "seasons",
                                "txt" : "seasons"
                        }
                }
        ],
        "stats" : {
                "nscanned" : 1,
                "nscannedObjects" : 0,
                "n" : 1,
                "nfound" : 1,
                "timeMicros" : 148
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:22.406 >>> db.runCommand({"text" : "TestFullText7", "search" : "mouse"});
{
        "queryDebugString" : "mous||||||",
        "language" : "english",
        "results" : [ ],
        "stats" : {
                "nscanned" : 0,
                "nscannedObjects" : 0,
                "n" : 0,
                "nfound" : 0,
                "timeMicros" : 110
        },
        "ok" : 1
}
[test] 2014-03-30 18:25:30.986 >>> db.TestFullText7.getIndexes();
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "_fts" : "text",
                        "_ftsx" : 1
                },
                "ns" : "test.TestFullText7",
                "name" : "$**_text",
                "weights" : {
                        "$**" : 1
                },
                "default_language" : "english",
                "language_override" : "language",
                "textIndexVersion" : 1
        }
]
[test] 2014-03-30 18:25:45.228 >>>
有帮助吗?

解决方案

MongoDB uses the Snowball stemming library. Unfortunately, this looks to be one of the limitations of this library.

You can see the pages for the english stemmer here. Compare the vocabulary + stemmed equivalent page and you can see "Mouse" becoming "Mous" and "Mice" still remaining "Mice".

You can see MongoDB's use of Snowball in their codebase here and here

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top