Index PDFs using Lucene, NoSuchFieldError

https://stackoverflow.com/questions/17842457

04-06-2022
|

Pregunta

I'm trying to index PDF files using Apache Lucene 4.4

I keep getting the following exception:

Exception in thread "main" java.lang.NoSuchFieldError: TOKENIZED
at com.snowtide.pdf.lucene.LuceneInterface20.addField(SourceFile:18)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:174)
at com.snowtide.pdf.lucene.PDFDocumentFactory.buildPDFDocument(SourceFile:84)
at com.apache.lucene.search.EasyLuceneIntegration.addPDFToIndex(EasyLuceneIntegration.java:134)
at com.apache.lucene.search.EasyLuceneIntegration.main(EasyLuceneIntegration.java:62)

I'm using PDFTextStream and following their example in here: enter link description here

Solución

The project you've referenced only supports up to Lucene 2.2. I'd recommend looking into tika, to get your PDFs into an acceptable format, or you can just use pdfbox (which, I believe, is the package Tika uses for PDFs).

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow