Question

I have a Solr index with about 2.5M items in it and I am trying to use an ExternalFileField to boost relevancy. Unfortunately, it's VERY slow when I try to do this, despite it being a beefy machine and Solr having lots of memory available.

In the external file I have contents like:

747501=3.8294805903e-07
747500=3.8294805903e-07
1718770=4.03292174724e-07
1534562=3.8294805903e-07
1956010=3.8294805903e-07
747509=3.8294805903e-07
747508=3.8294805903e-07
1718772=3.8294805903e-07
1391385=3.8294805903e-07
2089652=3.8294805903e-07
1948271=3.8294805903e-07
108368=3.84404072186e-06

Each line is a document ID and it's corresponding boosting factor.

In my query I'm using edismax, and I am using the boost parameter, setting it to pagerank. The entire query is here.

In my schema I have:

<!-- External File Field Type-->
<fieldType name="pagerank"
           keyField="id"
           stored="false"
           indexed="true"
           omitNorms="false"
           class="solr.ExternalFileField"
           valType="float"/>

and

   <field name="pagerank"
          type="pagerank"
          indexed="true"
          stored="true"
          omitNorms="false"/>

But the performance is just, plain bad. Am I missing a setting or something?

Était-ce utile?

La solution

According to the javadoc

The external file may be sorted or unsorted by the key field, but it will be substantially slower (untested) if it isn't sorted.

And as I see, ids in your file are unsorted. Can you sort it and test if it helps?

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top