Question

I have a Solr index with about 2.5M items in it and I am trying to use an ExternalFileField to boost relevancy. Unfortunately, it's VERY slow when I try to do this, despite it being a beefy machine and Solr having lots of memory available.

In the external file I have contents like:

747501=3.8294805903e-07
747500=3.8294805903e-07
1718770=4.03292174724e-07
1534562=3.8294805903e-07
1956010=3.8294805903e-07
747509=3.8294805903e-07
747508=3.8294805903e-07
1718772=3.8294805903e-07
1391385=3.8294805903e-07
2089652=3.8294805903e-07
1948271=3.8294805903e-07
108368=3.84404072186e-06

Each line is a document ID and it's corresponding boosting factor.

In my query I'm using edismax, and I am using the boost parameter, setting it to pagerank. The entire query is here.

In my schema I have:

<!-- External File Field Type-->
<fieldType name="pagerank"
           keyField="id"
           stored="false"
           indexed="true"
           omitNorms="false"
           class="solr.ExternalFileField"
           valType="float"/>

and

   <field name="pagerank"
          type="pagerank"
          indexed="true"
          stored="true"
          omitNorms="false"/>

But the performance is just, plain bad. Am I missing a setting or something?

Was it helpful?

Solution

According to the javadoc

The external file may be sorted or unsorted by the key field, but it will be substantially slower (untested) if it isn't sorted.

And as I see, ids in your file are unsorted. Can you sort it and test if it helps?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top