質問

i have 3 field in my solr index database and i search two queries but different field

Indexed data

employeeid : 220232

pskills : JSP, Servlets, HTML, Java

oskills : DB2, Oracle, JDBC, JNI, JSP, VC++, C, C++, Java, SQL, XML, Palm OS, UNIX, PALM OS, AIX, Linux, Solaris, Windows 2000, TCP/IP, IP, IDS, Asset Liability Management, Enterprise Application Integration

schema.xml

<field name="employeeid" type="string" indexed="true" stored="true" required="true" /> 
<field name="pskills" type="text" indexed="true" stored="false" required="false" />  
<field name="oskills" type="text" indexed="true" stored="false" required="false" />

Query 1 = employeeid : 220232 AND (pskills : ( ( "java" ) )^3000.00)

Score: 0.6169528

Query 2 = employeeid : 220232 AND (oskills : ( ( "java" ) )^3000.00)

Score: 0.32307756

My question is Both field having "Java" keyword then why given different value

役に立ちましたか?

解決

A number of reasons! Particularly:

  • If the fields are different lengths, the score will be impacted (matches in shorter fields are weighed more heavily) (Definitely a factor here)
  • More than one match is found in one of the fields, giving that field a higher tf (say java appears once in oskills, but twice in pskills, for instance) (Doesn't appear to be the case here, but bears stating)
  • The terms java is more common across all documents in one field than the other. If, for instance, across all documents, "java" appears in oskills in 1000 documents, but it only appears in pskills in 100 documents, then the match in pskills scores higher due to idf. (Don't know if this has an effect, since I don't know what's in the rest of the documents)

For some documentation on lucene scoring, see TFIDFSimilarity.

The scores you get are specific to the query and the state of the index at the time it was run. They aren't intended to be compared to the scores of other queries.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top