문제

I've seen this link: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika What I got is pure text without any style from Tika for Solr to search in . Is it possible to have the text with its style from Solr? In other words, we need to show text with its original style after searched by solr .

도움이 되었습니까?

해결책

If you think about it, what is "original style" in a pdf? What components of the "style" do you want to keep?

It's not just font and weight, it's stroke, fill, angle, path, graphics, tracking, transparency, transformations and more. IF you got all that, how would you display it in your UI/Web?

You can't really replicate the original style any way other than displaying the original PDF. So that's the way people usually do it if they want the original formatting.

Otherwise, they just use the pure text.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top