Question

I've seen this link: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika What I got is pure text without any style from Tika for Solr to search in . Is it possible to have the text with its style from Solr? In other words, we need to show text with its original style after searched by solr .

Was it helpful?

Solution

If you think about it, what is "original style" in a pdf? What components of the "style" do you want to keep?

It's not just font and weight, it's stroke, fill, angle, path, graphics, tracking, transparency, transformations and more. IF you got all that, how would you display it in your UI/Web?

You can't really replicate the original style any way other than displaying the original PDF. So that's the way people usually do it if they want the original formatting.

Otherwise, they just use the pure text.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top