Question

I am trying to use solr's langid UpdateRequestProcessor. Here is the config:

<updateRequestProcessorChain name="languages">
    <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
        <lst name="invariants">
            <str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>
            <str name="langid.whitelist">en,fr</str>
            <str name="langid.fallback">en</str>
            <str name="langid.langField">detectedlang</str>
            <bool name="langid.map">true</bool>
            <bool name="langid.map.keepOrig">false</bool>
        </lst>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

My fields look like this:

<fields>
    <field name="_root_" type="string" indexed="true" stored="false"/>
    <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>

    <field name="id" type="string" indexed="true" stored="true" required="true" />

    <!-- raw fields from sql db -->
    <field name="expertise_id" type="int" indexed="true" stored="true" />
    <field name="person_id" type="int" indexed="true" stored="true" />
    <field name="mod_date" type="date" indexed="true" stored="true" />
    <field name="lang" type="string" indexed="true" stored="true" />
    <field name="focus" type="text_general" indexed="true" stored="true" />
    <field name="expertise" type="text_general" indexed="true" stored="true" />
    <field name="platforms" type="text_general" indexed="true" stored="true" />
    <field name="partners" type="text_general" indexed="true" stored="true" />
    <field name="participation" type="text_general" indexed="true" stored="true" />
    <field name="additional" type="text_general" indexed="true" stored="true" />
    <field name="tag" type="text_general" termVectors="true" multiValued="true" />      
    <field name="facet_tag" type="string" stored="false" indexed="false" docValues="true" multiValued="true" default=""/>

    <!-- language detected by solr -->
    <field name="detectedlang" type="string" indexed="true" stored="true" />

    <!-- defined locale fields -->
    <dynamicField name="*_en" type="text_en" indexed="true" stored="true" />
    <dynamicField name="*_fr" type="text_fr" indexed="true" stored="true" />

    <copyField source="tag" target="facet_tag"/>

</fields>

When I run an update or a dataimport I know that the "languages" update chain is used because focus is mapped to focus_en and detectedlang is set. However, none of the other fields in langid.fl are mapped. Why?

An example update query:

{
  "additional": "here is some other information about me.",
  "expertise_id": "10000",
  "id": "foo_10000",
  "focus": "this is my new focus. It is very exciting. When I am done I expect to be super experienced."
}

And here is the result of a query for expertise_id=10000. Note that additional has not been moved to additional_en:

  "response":{"numFound":1,"start":0,"docs":[
      {
        "additional":"here is some other information about me.",
        "expertise_id":10000,
        "id":"foo_10000",
        "detectedlang":"en",
        "focus_en":"this is my new focus. It is very exciting. When I am done I expect to be super experienced.",
        "_version_":1447088846110982144}]
  }
Was it helpful?

Solution

Turns out that the problem is a syntax error. This line:

<str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>

must be

<str name="langid.fl">focus,expertise,platforms,partners,participation,additional</str>

The docs state that the field list should be comma or space separated values. Evidently, comma and space screws things up (though it works fine in other Solr contexts like fl in a requestHandler which langid.fl is supposedly modelled on). I tried the space-separated syntax as well, but it did not fix my issue.

I hope this helps someone.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top