質問

To index my website, I have a Ruby script that in turn generates a shell script that uploads every file in my document root to Solr. The shell script has many lines that look like this:

  curl -s \
 "http://localhost:8983/solr/update/extract?literal.id=/about/core-team/&commit=false" \
 -F "myfile=@/extra/www/docroot/about/core-team/index.html"

...and ends with:

curl -s http://localhost:8983/solr/update --data-binary \
'<commit/>' -H 'Content-type:text/xml; charset=utf-8'

This uploads all documents in my document root to Solr. I use tika and ExtractingRequestHandler to upload documents in various formats (primarily PDF and HTML) to Solr.

In the script that generates this shell script, I would like to boost certain documents based on whether their id field (a/k/a url) matches certain regular expressions.

Let's say that these are the boosting rules (pseudocode):

boost = 2 if url =~ /cool/
boost = 3 if url =~ /verycool/
# otherwise we do not specify a boost

What's the simplest way to add that index-time boost to my http request?

I tried:

curl -s \
 "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
 -F "myfile=@/extra/www/docroot/verycool/core-team/index.html" \
 -F boost=3

and:

curl -s \
 "http://localhost:8983/solr/update/extract?literal.id=/verycool/core-team/&commit=false" \
 -F "myfile=@/extra/www/docroot/verycool/core-team/index.html" \
 -F boost.id=3

Neither made a difference in the ordering of search results. What I want is for the boosted results to come first in search results, regardless of what the user searched for (provided of course that the document contains their query).

I understand that if I POST in XML format I can specify the boost value for either the entire document or a specific field. But If I do that, it isn't clear how to specify a file as the document contents. Actually, the tika page provides a partial example:

curl "http://localhost:8983/solr/update/extract?literal.id=doc5&defaultField=text" \
--data-binary @tutorial.html -H 'Content-type:text/html'

But again it isn't clear where/how to specify my boost. I tried:

curl \ 
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost=3"\
--data-binary @mydoc.html -H 'Content-type:text/html'

and

curl \ 
"http://localhost:8983/solr/update/extract?literal.id=mydocid&defaultField=text&boost.id=3"\
--data-binary @mydoc.html -H 'Content-type:text/html'

Neither of which altered search results.

Is there a way to update just the boost attribute of a document (not a specific field) without altering the document contents? If so, I could accomplish my goal in two steps: 1) Upload/index document as I have been doing 2) Specify boost for certain documents

正しい解決策はありません

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top