MongoDB query: how to check if a string stored in a document is contained in another string
-
25-10-2019 - |
Question
I have a collection with 8k+ strings and I need to check if a particular string is contained in another string. For example:
StringInDb = "this is a string"
TheOtherString = "this is a long string that contains this is a string"
with linq I used something like:
from s in DB.Table
where TheOtherString.IndexOf(s.StringInDb ) > -1
select s.StringInDb;
How can I do this (efficiently) in mongodb (even better using the c# .net driver)?
Solution
To me this sounds like you need to use map/reduce: map out all your strings from the DB and reduce to the ones contained in your long string. Cant remember the C# off the top of my head. Can find it later if you want.
Update: The native language of MongoDB is JavaScript and Map/Reduce is run "inside the mongodb engine", which implies that the map and reduce function must be JavaScript, not C#. They can be called from C# though, as illustrated by this example taken from the official MogoDB C# driver documentation (http://www.mongodb.org/display/DOCS/CSharp+Driver+Tutorial#CSharpDriverTutorial-MapReducemethod). The example counts how many times each key is found in a collection:
var map =
"function() {" +
" for (var key in this) {" +
" emit(key, { count : 1 });" +
" }" +
"}";
var reduce =
"function(key, emits) {" +
" total = 0;" +
" for (var i in emits) {" +
" total += emits[i].count;" +
" }" +
" return { count : total };" +
"}";
var mr = collection.MapReduce(map, reduce);
foreach (var document in mr.GetResults()) {
Console.WriteLine(document.ToJson());
}
OTHER TIPS
In mongodb for contains
you need to user regexp, so c# query will be following:
var query = Query.Matches("StringParamName",
BsonRegularExpression.Create(".*this is a string.*", "-i"));
After you have done with query building, put this query into Collection.FindAs<DocType>(query)
method.
-i
- means ignore case
Regexp in mongodb work slow, because it can't use index. But for 8k collection it should work pretty quick.
This a wrapper used in my production system. When you should always call GetBsonValue() and it would do the rest of the work for you
/// <summary>
/// Makes a Bson object for current value object
/// </summary>
/// <returns>The Bson object for current value object</returns>
private BsonValue GetBsonValue()
{
if (!_value.Contains(_wildCard))
return _value;
string pattern = ApplyWildCard();
return BsonRegularExpression.Create(new Regex(pattern, RegexOptions.IgnoreCase));
}
/// <summary>
/// Finds wildcard characters and substitutes them in the value string
/// </summary>
/// <returns></returns>
private string ApplyWildCard()
{
return string.Format("^{0}$", _value.Replace(_wildCard, ".*"));
}
From outside you call the next method, so there is not possibility you would forget:
public QueryComplete BuildQuery()
{
return Query.EQ(_key, GetBsonValue());
}
"$where" : "\"
This is a long string that contains this is a string\".match(this.YourFieldName)"
Is this you want?