Indexing pdf documents

Question

What the best way to index pdf documents? Should I index them by converting pdf documents to txt or there is a better way to index pdf files?

Solution

Assuming you're talking about solr: see the ExtractingRequestHandler.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow