Question

I am trying to use TikaEntityProcessor to index the .html file content. Somehow I am not able to get it correctly. I have checked the error log and I got the following error.

SEVERE: Full Import failed:java.lang.RuntimeException:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:tika-test Processing Document # 1
    at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:273)
    at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:382)
    at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:448)
    at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:429)
Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to load EntityProcessor implementation for entity:tika-test Processing Document # 1
    at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
    at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessorWrapper(DocBuilder.java:697)
    at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessorWrapper(DocBuilder.java:703)
    at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:215)
    ... 3 more
Caused by: java.lang.ClassNotFoundException: Unable to load TikaEntityProcessor or org.apache.solr.handler.dataimport.TikaEntityProcessor
    at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:899)
    at org.apache.solr.handler.dataimport.DocBuilder.getEntityProcessorWrapper(DocBuilder.java:694)
    ... 5 more
Caused by: org.apache.solr.common.SolrException: Error loading class 'TikaEntityProcessor'
    at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:436)
    at org.apache.solr.handler.dataimport.DocBuilder.loadClass(DocBuilder.java:889)
    ... 6 more
Caused by: java.lang.ClassNotFoundException: TikaEntityProcessor
    at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

My data-config.xml file is as follow:

<dataConfig>
    <dataSource type="BinFileDataSource" />
    <document>
    <entity name="f" processor="FileListEntityProcessor" baseDir="path/to/basedir/" fileName=".*html" recursive="true" rootEntity="true" dataSource="null">
       <entity name="tika-test" processor="TikaEntityProcessor" url="path/tohtml/files/" format="text" onError="skip">
            <field column="product_id" name="product_id" meta="true"/>
            <field column="type" name="type" meta="true"/>
            <field column="title" name="title" meta="true"/>        
        </entity>
    </entity>
</document>
</dataConfig>

I have added following code in solrconfig.xml

<requestHandler name="/dataimport"  class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
  <str name="config">/path/to/data-config.xml</str>
</lst>

I have kept the default schema.xml file and added the following code in that file.

<field name="product_id" type="string" indexed="true" stored="true"/>
<field name="title" type="string" indexed="true" stored="true"/>
<field name="type" type="string" indexed="true" stored="true"/>

can anyone please tell me what am I missing here? or why I am get the errors? and what's it's solution.

Was it helpful?

Solution 3

You need to add the jars (or paths) for apache-solr-dataimporthandler-4.0, apache-solr-dataimporthandler- extras-4.0 and apache-solr-cell-4.o in the dist folder as well as corresponding files in the contrib folder.

that's how my problem is solved.

OTHER TIPS

Did you add lib directives in solrconfig.xml to make sure that tika libraries are loaded? You need (I believe):

<lib dir="${user.dir}/../dist/" regex="solr-cell-\d.*\.jar" />
<lib dir="${user.dir}/../contrib/extraction/lib" regex=".*\.jar" />

If you are using Solr 4, not Solr 4.1, you may need to say apache-solr-cell.... instead of just solr-cell...

TikaEntityProcessor is available in the dataimporthandler-extras jar file.
you can use apache-solr-dataimporthandler-.*\.jar pattern to load the file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top