Solr DIH regexTransformer seems to only know about one capturing parentheses group

StackOverflow https://stackoverflow.com/questions/14867592

  •  09-03-2022
  •  | 
  •  

Domanda

I am importing data using the DIH and have a need to parse a string, capture two numbers, then populate a field of type=location (which accepts a "lat,long" coordinate pair). The logical thing to do is:

  <field column="latLong" 
         regex="Latitude is ([-\d.]+)\s+ Longitude is ([-\d.]+)\s+" 
         replaceWith="$1,$2" />

It seems the DIH only knows about a single capture group. So $2 is never used.

Has anyone ever used more than one capture with the regexTransformer? Searching the documentation didn't provide any examples of $2 or $3. What gives, O ye priests of Solr?

È stato utile?

Soluzione

It is not true that Solr DIH does not understand $2, $3, etc.,

I just tried this. Added this in DIH data-config.xml:

<entity name="foo" 
        transformer="RegexTransformer" 
        query="SELECT list_id FROM lists WHERE list_id = ${Lists.id}">
    <field column="firstLastNum" 
           regex="^(\d).*?(\d)$" 
           replaceWith="$1:$2" 
           sourceColName="list_id"/>
</entity>

and then added the field in my schema.xml

<field name="firstLastNum" type="string" indexed="true" stored="true"/>

When I indexed a document with list_id = 390, firstLastNum was 3:0 which is indeed correct.

I suspect that the issue may be because of an incorrect regex which matches only the first part and not the second. Maybe try this regex:

regex="Latitude is ([-\d.]+)\s*Longitude is ([-\d.]+)"

Another reason could be that latLong is of location type and $1,$2 is of string type, but I am not sure about that.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top