Question

Hi i have a file with the following contents. the character '.' denotes space.

abc-a-1..............lime..............red........
xyz..................peach.............yellow.....

I want to use the data import handler to parse this data into three fields. this is what i have so far-

<entity name="iCode" processor="LineEntityProcessor" url="file.csv" 
                   dataSource="find_file"
                   transformer="RegexTransformer,TemplateTransformer">

  <field column="code" regex="^(\w*)"  sourceColName="rawLine" />
  <field column="fruit" regex="(\W)\b.*"  sourceColName="rawLine" />
  <field column="color" regex="(\w*)\s*$"  sourceColName="rawLine" />

</entity>

The import runs successfully, but i dont get any documents created in solr. I believe the regex are not correct.

Any ideas how I can get this to work?

Was it helpful?

Solution

Try

<field column="code" regex="^(\S+)" />
<field column="fruit" regex="(\S)+(?=\s+\S+\s+$)" />
<field column="color" regex="(\S+)(?=\s+$)" />
  • The first matches all non-whitespaces at the beginning of the line.
  • The second matches all non-whitespaces followed by whitespaces and non-spaces at the end of the line, leaving them out of the result.
  • The third matches all non-whitespaces followed by whitespaces at the end of the line, leaving them out of the result.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top