質問

I am trying to scan all pdf/doc files in a directory. This works fine and I am able to scan all documents.

The next thing i'm trying to do is also receiving the filename of the file in the search results. The filename however never shows up. I tried a couple of things, but the documentation is not very helpfull about how to do this.

I am using the solr configuration found in the solr distribution: apache-solr-3.1.0/example/example-DIH/solr/tika/conf

This is my dataConfig:

<dataConfig>
  <dataSource type="BinFileDataSource" name="bin"/>
  <document>
    <entity name="f" processor="FileListEntityProcessor" recursive="true" 
            rootEntity="false" dataSource="null" baseDir="C:/solrtestsmall"
            fileName=".*\.(DOC)|(PDF)|(pdf)|(doc)" onError="skip">

      <entity name="tika-test" processor="TikaEntityProcessor" 
              url="${f.fileAbsolutePath}" format="text" dataSource="bin" 
              onError="skip">
        <field column="Author" name="author" meta="true"/>
        <field column="title" name="title" meta="true"/>
        <field column="text" name="text"/>
      </entity>

      <field column="fileName" name="fileName"/>
    </entity>
  </document>
</dataConfig>

I am interested in the way how to configure this correctly, and also the any other places I can find specific documentation.

役に立ちましたか?

解決

You should use file instead of fileName in column

<field column="file" name="fileName"/>

Don't forget to add the 'fileName' to the schema.xml in the fields section.

<field name="fileName" type="string" indexed="true" stored="true" />
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top