Question

I am trying to fill a Solr index from 2 different data-sources (xml and db) using the DataImportHandler.

1st try: Created 2 data-config.xml files, one for the xml import and one for the db import. The db-config would read id and lets say field A. The xml-config also id and field B.

That works for both (i could import from both datasources), but the index got overwritten each time (with clean=false of course), so I either had id and A or id and B

so on for the 2nd try: merged the 2 files into one

<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource 
        name="cr-db"
        jndiName="xyz"
        type="JdbcDataSource" />
    <dataSource 
        name="cr-xml" 
        type="FileDataSource" 
        encoding="utf-8" />


    <document name="doc">
        <entity 
            dataSource="cr-xml" 
            name="f" 
            processor="FileListEntityProcessor" 
            baseDir="/path/to/xml" 
            filename="*.xml" 
            recursive="true" 
            rootEntity="false" 
            onError="skip">
            <entity
                name="xml-data" 
                dataSource="cr-xml" 
                processor="XPathEntityProcessor" 
                forEach="/root" 
                url="${f.fileAbsolutePath}" 
                transformer="DateFormatTransformer" 
                onError="skip">
                <field column="id" xpath="/root/id" /> 

                <field column="A" xpath="/root/a" />
            </entity>

            <entity 
                name="db-data" 
                dataSource="cr-db"
                query="
                    SELECT  
                        id, b
                    FROM 
                        a_table
                    WHERE 
                        id = '${f.file}'">
                <field column="B" name="b" />
            </entity>
        </entity>
    </document>
</dataConfig>

A bit funny is the id = '${f.file}'-part i guess, but that is the id that is used. The select statement is correctly formed, but I get an exception when trying to run that file in the dataimport.jsp. The first part (xml) works fine, but when he gets to the db part it raises:

java.lang.RuntimeException: java.io.FileNotFoundException: 
Could not find file: SELECT id, b FROM a_table WHERE id = '12345678.xml'
at org.apache.solr.handler.dataimport.FileDataSource.getFile[..]

Any advice? Thanks in advance


EDIT I found the problem for the FileNotFoundException: within the entity tags the datasource-attributes need to be camelCased --> dataSource.. Now it runs through, but with the same outcome as in the first try: only field B gets in the index. If I take the db-entity out, then the file contents are indexed (field A)

Was it helpful?

Solution

Try:

<entity name="db-data" dataSource="cr-db"

The attributes are case-sensitive, so your wrong-cased attribute name is ignored and you fall back to the default one (which somehow is the file one).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top