Question

Looked up information provided on a related question to set up a import of all documents that are stored within a mysql database. you can find the original question here

Thanks to steps provided I was able to make it work for me with mysql DB. My config looks identical to the one mentioned at above link.

<dataConfig>
  <dataSource name="db"
    jndiName="java:jboss/datasources/somename"
    type="JdbcDataSource" 
    convertType="false" />
  <dataSource name="dastream" type="FieldStreamDataSource" />
  <dataSource name="dareader" type="FieldReaderDataSource" />
  <document name="docs">
    <entity name="doc" query="select * from document" dataSource="db">
      <field name="id" column="id" />
      <field name="name" column="descShort" />
      <entity name="comment" 
        transformer="HTMLStripTransformer" dataSource="db"
        query="select id, body, subject from comment where iddoc='${doc.id}'">
        <field name="idComm" column="id" />
        <field name="detail" column="body" stripHTML="true" />
        <field name="subject" column="subject" />
      </entity>
      <entity name="attachments" 
        query="select id, attName, attContent, attContentType from Attachment where iddoc='${doc.id}'"
        dataSource="db">
        <field name="attachment_name" column="attName" />
        <field name="idAttachment" column="id" />
        <field name="attContentType" column="attContentType" />
        <entity name="attachment" 
          dataSource="dastream"
          processor="TikaEntityProcessor"
          url="attContent"
          dataField="attachments.attContent"
          format="text"
          onError="continue">
          <field column="text" name="attachment_detail" />
        </entity>
      </entity>
    </entity>
  </document>
</dataConfig>

I have a variety of attachments in DB such as jpeg, pdf, excel, doc and plain text. Now everything works great for most of the binary data (jpeg, pdf doc and such). But the import fails for certain files. It appears that the datasource is set up to throw an exception when it encounters a String instead of an InputStream. I set the onError="continue" flag on the entity "attachment" to ensure that the DataImport went through despite this error. Noticed that this problem has happened for a number of files. The exception is given below. Ideas ??

Exception in entity : attachment:java.lang.RuntimeException: unsupported type : class java.lang.String 
at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:89) 
at org.apache.solr.handler.dataimport.FieldStreamDataSource.getData(FieldStreamDataSource.java:48) 
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:103) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:465) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:491) 
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) 
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:319) 
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:227) 
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:422) 
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) 
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468)
Was it helpful?

Solution

I know this is an outdated question, but: it appears to me that this exception is thrown when the BLOB (I work with Oracle) is null. When I add a where clause like "blob_column is not null", the problem disappears for me (Solr 4.10.1)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top