Question

How can I post, index and search for content within an odt file stored in my solr_home directory?

I have tried understanding and applying the below mentioned pages and have included a body field in the schema:

Indexing text and html files

Simple Post Tool -Confluence

The resourcename field contains the file location but content field is blank. But i am still not able to search the file contents even though it shows that the file is indexed and the changes are committed. Is there any end to end documentation for such a requirement. I am using solr with Tomcat on a linux machine. I'm a newbie at solr and might be missing out details not mentioned in the above pages.

Was it helpful?

Solution

Use Apache tika to extract content and send it to SOLR

Tika tika = new Tika();
InputStream fileInputStream = new FileInputStream("d:\\fileName.odt");
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, "fileName.odt");

String content = tika.parseToString(fileInputStream, metadata);

Alternatively you can also use ExtractingRequestHandler

OTHER TIPS

Apache Tika was required. Found it at Apache Tika Download

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top