parsing pdf content stream to understand paragraph boundary
Frage
Is there a way to parse the pdf content stream and identify paragraph boundary? I read ISO 32000-1:2008 but could not understand if, the pdf content stream contains any operator which tells a display software to start the paragraph, or end it. Can any text extractor software like pdfbox
or xpdf
provide that information?
Keine korrekte Lösung
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow