Pergunta

I've seen this link: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Content-Extraction-Tika What I got is pure text without any style from Tika for Solr to search in . Is it possible to have the text with its style from Solr? In other words, we need to show text with its original style after searched by solr .

Foi útil?

Solução

If you think about it, what is "original style" in a pdf? What components of the "style" do you want to keep?

It's not just font and weight, it's stroke, fill, angle, path, graphics, tracking, transparency, transformations and more. IF you got all that, how would you display it in your UI/Web?

You can't really replicate the original style any way other than displaying the original PDF. So that's the way people usually do it if they want the original formatting.

Otherwise, they just use the pure text.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top