문제

I have a text file which consist of the following data:

andy~1234;M~64365113~2P3VWU3H10~~
mike~4152;M~64365113~2P3VWU3H10~0.6~MG
lesa~4512;F,PM~~N/A~16~MG
riky~7845;M,PM2~~N/A~3.99~MG

I wish to convert it into a solr document in the following manner :

  1. Each row is considered as 1 <doc> document in solr.
  2. '~' is a delimiter which means fields <field> of document.

Do I need to use a DataImportHandler for handling these kind of files? which kind of DataImportHandler is useful. I've gone through LineEntityProcessor, but i didn't understand how I can use it for my problem.

도움이 되었습니까?

해결책

Assuming that you know the field names (lines contain just the values), here's an example of how you can do that using a FileDatasource + LineEntityProcessor + ScriptTransformer:

<dataConfig>  
    <dataSource encoding="UTF-8" type="FileDataSource" name="file-datasource"/>
    <script><![CDATA[
        function parse(row)    
        {
            var rawLine = row.get("rawLine")

            // Split the rawLine 
            // And for each field

            // row.put('fieldName', fieldValue);                    

            return row;
        }
    ]]></script>        
    <document>
        <entity name="jc"
            processor="LineEntityProcessor"
            url="file:///your.path.file.here"
            dataSource="file-datasource"
            transformer="script:parse">
    </document>
</dataConfig>   
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top