Pregunta

I'm trying to index PDF files using Apache Lucene 4.4

I keep getting the following exception:

Exception in thread "main" java.lang.NoSuchFieldError: TOKENIZED
at com.snowtide.pdf.lucene.LuceneInterface20.addField(SourceFile:18)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:174)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:84)
at com.apache.lucene.search.EasyLuceneIntegration.addPDFToIndex(EasyLuceneIntegration.java:134)
at com.apache.lucene.search.EasyLuceneIntegration.main(EasyLuceneIntegration.java:62)

I'm using PDFTextStream and following their example in here: enter link description here

¿Fue útil?

Solución

The project you've referenced only supports up to Lucene 2.2. I'd recommend looking into , to get your PDFs into an acceptable format, or you can just use (which, I believe, is the package Tika uses for PDFs).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top