Question

I have a Solr MoreLikeThis query that is producing some decidedly non-related results. When I look at the debug for the query, I can see that the query is matching on newline characters.

Here's the query:

mlt?q=is_lesson_id:49029&start=0&rows=3&fl=*,score&wt=json&fq={!tag=sites}sm_sitename:(FCM OR BCM OR CCM)&mlt.interestingTerms=details&mlt.match.include=false&mlt.match.offset=0&mlt.fl=title, body&mlt.mintf=2&mlt.mindf=1&mlt.minwl=4&mlt.boost=true&mlt.qf=title^1000 body&indent=on&debugQuery=on

Here's the explain:

 "interestingTerms":[
    "body:rabbit",1.0,
    "body:bunni",0.8582874,
    "body:easter",0.7999738,
    "body: ",0.5719101,
    "body:ampampnbsp",0.51804715,
    "body:nbsp",0.36014518],
 "debug":{
    "rawquerystring":"is_lesson_id:49029",
    "querystring":"is_lesson_id:49029",
    "parsedquery":"body:rabbit body:bunni^0.8582874 
                   body:easter^0.7999738              
                   body: ^0.5719101 
                   body:ampampnbsp^0.51804715 
                   body:nbsp^0.36014518",
    "parsedquery_toString":"body:rabbit 
                            body:bunni^0.8582874 
                            body:easter^0.7999738 
                            body: ^0.5719101 
                            body:ampampnbsp^0.51804715 
                            body:nbsp^0.36014518",
    "explain":{
"p5zqzz/node/681":"\n0.14956066 = (MATCH) product of:\n  0.44868195 = (MATCH) sum of:\n    0.20911716 = (MATCH) weight(body:bunni^0.8582874 in 327), product of:\n      0.5523649 = queryWeight(body:bunni^0.8582874), product of:\n        0.8582874 = boost\n        6.9227004 = idf(docFreq=116, maxDocs=43690)\n        0.09296464 = queryNorm\n      0.3785852 = (MATCH) fieldWeight(body:bunni in 327), product of:\n        1.0 = tf(termFreq(body:bunni)=1)\n        6.9227004 = idf(docFreq=116, maxDocs=43690)\n        0.0546875 = fieldNorm(field=body, doc=327)\n    0.2395648 = (MATCH) weight(body:easter^0.7999738 in 327), product of:\n      0.4799619 = queryWeight(body:easter^0.7999738), product of:\n        0.7999738 = boost\n        6.453766 = idf(docFreq=186, maxDocs=43690)\n        0.09296464 = queryNorm\n      0.49913296 = (MATCH) fieldWeight(body:easter in 327), product of:\n        1.4142135 = tf(termFreq(body:easter)=2)\n        6.453766 = idf(docFreq=186, maxDocs=43690)\n        0.0546875 = fieldNorm(field=body, doc=327)\n  0.33333334 = coord(2/6)\n",
"p5zqzz/node/621":"\n0.14027193 = (MATCH) product of:\n  0.42081577 = (MATCH) sum of:\n    0.21124022 = (MATCH) weight(body:bunni^0.8582874 in 328), product of:\n      0.5523649 = queryWeight(body:bunni^0.8582874), product of:\n        0.8582874 = boost\n        6.9227004 = idf(docFreq=116, maxDocs=43690)\n        0.09296464 = queryNorm\n      0.38242877 = (MATCH) fieldWeight(body:bunni in 328), product of:\n        1.4142135 = tf(termFreq(body:bunni)=2)\n        6.9227004 = idf(docFreq=116, maxDocs=43690)\n        0.0390625 = fieldNorm(field=body, doc=328)\n    0.20957555 = (MATCH) weight(body:easter^0.7999738 in 328), product of:\n      0.4799619 = queryWeight(body:easter^0.7999738), product of:\n        0.7999738 = boost\n        6.453766 = idf(docFreq=186, maxDocs=43690)\n        0.09296464 = queryNorm\n      0.4366504 = (MATCH) fieldWeight(body:easter in 328), product of:\n        1.7320508 = tf(termFreq(body:easter)=3)\n        6.453766 = idf(docFreq=186, maxDocs=43690)\n        0.0390625 = fieldNorm(field=body, doc=328)\n  0.33333334 = coord(2/6)\n",
"p5zqzz/node/1204":"\n0.10955032 = (MATCH) product of:\n  0.32865095 = (MATCH) sum of:\n    0.10455858 = (MATCH) weight(body:bunni^0.8582874 in 432), product of:\n      0.5523649 = queryWeight(body:bunni^0.8582874), product of:\n        0.8582874 = boost\n        6.9227004 = idf(docFreq=116, maxDocs=43690)\n        0.09296464 = queryNorm\n      0.1892926 = (MATCH) fieldWeight(body:bunni in 432), product of:\n        1.0 = tf(termFreq(body:bunni)=1)\n        6.9227004 = idf(docFreq=116, maxDocs=43690)\n        0.02734375 = fieldNorm(field=body, doc=432)\n    0.22409238 = (MATCH) weight(body:easter^0.7999738 in 432), product of:\n      0.4799619 = queryWeight(body:easter^0.7999738), product of:\n        0.7999738 = boost\n        6.453766 = idf(docFreq=186, maxDocs=43690)\n        0.09296464 = queryNorm\n      0.46689618 = (MATCH) fieldWeight(body:easter in 432), product of:\n        2.6457512 = tf(termFreq(body:easter)=7)\n        6.453766 = idf(docFreq=186, maxDocs=43690)\n        0.02734375 = fieldNorm(field=body, doc=432)\n  0.33333334 = coord(2/6)\n"},
    "filter_queries":["{!tag=sites}sm_sitename:(FCM OR BCM OR CCM)"],
    "parsed_filter_queries":["sm_sitename:FCM sm_sitename:BCM sm_sitename:CCM"]}}

Is this indicative of a misconfiguration on the server, or is the content being indexed improperly, or does the query need to be changed?

Was it helpful?

Solution

Are you indexing HTML? You may want to strip the HTML markup out of the text at the beginning of your filter pipeline. See HtmlStripCharFilter on this page for more info: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top