Frage

I want to check the Levenstein distance between the query and the title of the document and later to filter results by their strdist score.

the schema is:

<fields>
     <field name="id" type="string" indexed="true" stored="true" required="true" /> 
     <field name="title" type="text_general" indexed="true" stored="true" required="true" />
     <field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />
</fields>

In my index i have the following doc:

{
   "id":"1",
   "title":"iPhone 4S Battery Replacement"
}

So when I'm sending the following query:

http://localhost:8983/solr/collection1/query?q=title:iPhone+4S+Battery+Replacement&fl=*,score,lev_dist:strdist("iPhone+4S+Battery+Replacement",title,edit)

I get:

{
    "id":"1",
    "title":"iPhone 4S Battery Replacement",
    "_version_":1452659974334316549,
    "score":6.4907703,
    "lev_dist":0.37931037
}

But I was expecting to get lev_dist=1.0. why is it 0.379? What am I doing wrong?

War es hilfreich?

Lösung

According to the docs the strdist function takes two Strings to compare them. It does work differently on analyzed fields.

Calculate the distance between two strings. Uses the Lucene spell checker StringDistance interface and supports all of the implementations available in that package, plus allows applications to plug in their own via Solr's resource loading capabilities. strdist takes (string1, string2, distance measure)

After trying around and reading of a grokbase user who had a similar issue, you need to add a field like title_raw in your schema, see below, and reindex.

<fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" /> 
    <field name="title" type="text_general" indexed="true" stored="true" required="true" />
    <field name="title_raw" type="string" indexed="true" stored="true"  />
    <field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />
</fields>

Then you would query like

query?q=title:iPhone+4S+Battery+Replacement&fl=*,score,lev_dist:strdist("iPhone 4S Battery Replacement",title_raw,edit)

As you can see, I removed the + from the first string to compare, as they would also be taken into account when calculating the distance.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top