Question

I have a text file which consist of the following data:

andy~1234;M~64365113~2P3VWU3H10~~
mike~4152;M~64365113~2P3VWU3H10~0.6~MG
lesa~4512;F,PM~~N/A~16~MG
riky~7845;M,PM2~~N/A~3.99~MG

I wish to convert it into a solr document in the following manner :

  1. Each row is considered as 1 <doc> document in solr.
  2. '~' is a delimiter which means fields <field> of document.

Do I need to use a DataImportHandler for handling these kind of files? which kind of DataImportHandler is useful. I've gone through LineEntityProcessor, but i didn't understand how I can use it for my problem.

Was it helpful?

Solution

Assuming that you know the field names (lines contain just the values), here's an example of how you can do that using a FileDatasource + LineEntityProcessor + ScriptTransformer:

<dataConfig>  
    <dataSource encoding="UTF-8" type="FileDataSource" name="file-datasource"/>
    <script><![CDATA[
        function parse(row)    
        {
            var rawLine = row.get("rawLine")

            // Split the rawLine 
            // And for each field

            // row.put('fieldName', fieldValue);                    

            return row;
        }
    ]]></script>        
    <document>
        <entity name="jc"
            processor="LineEntityProcessor"
            url="file:///your.path.file.here"
            dataSource="file-datasource"
            transformer="script:parse">
    </document>
</dataConfig>   
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top