I'm trying to index PDF files using Apache Lucene 4.4

I keep getting the following exception:

Exception in thread "main" java.lang.NoSuchFieldError: TOKENIZED
at com.snowtide.pdf.lucene.LuceneInterface20.addField(SourceFile:18)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:174)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:84)
at com.apache.lucene.search.EasyLuceneIntegration.addPDFToIndex(EasyLuceneIntegration.java:134)
at com.apache.lucene.search.EasyLuceneIntegration.main(EasyLuceneIntegration.java:62)

I'm using PDFTextStream and following their example in here: enter link description here

有帮助吗?

解决方案

The project you've referenced only supports up to Lucene 2.2. I'd recommend looking into , to get your PDFs into an acceptable format, or you can just use (which, I believe, is the package Tika uses for PDFs).

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top