Index PDFs using Lucene, NoSuchFieldError

https://stackoverflow.com/questions/17842457

04-06-2022
|

题

I'm trying to index PDF files using Apache Lucene 4.4

I keep getting the following exception:

Exception in thread "main" java.lang.NoSuchFieldError: TOKENIZED
at com.snowtide.pdf.lucene.LuceneInterface20.addField(SourceFile:18)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:174)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:84)
at com.apache.lucene.search.EasyLuceneIntegration.addPDFToIndex(EasyLuceneIntegration.java:134)
at com.apache.lucene.search.EasyLuceneIntegration.main(EasyLuceneIntegration.java:62)

I'm using PDFTextStream and following their example in here: enter link description here

解决方案

The project you've referenced only supports up to Lucene 2.2. I'd recommend looking into tika, to get your PDFs into an acceptable format, or you can just use pdfbox (which, I believe, is the package Tika uses for PDFs).

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow