The case:

  • Server doesn't support exec/shell_exec (so pdftotext is excluded)
  • Other libraries don't accept the PDF. Pdftotext works (tested on the files locally)

Here are some excerpts from the (PDF)code:


5 0 obj
>
stream
Gat$ugPXc?%"6H'p]ofd'_qs00UX27?3p0*8m>KOQL4]:u"*$$^'f*q*SGMee*e$5&=alj\@GV7YPq9pg!Lr0>Y2n'&lmd4Br?V9N
P:_",WI.kJ\#'cs>77M9eTkA;,t#f)aaGuNS-6=Wp*uBg,Ft9Tcj#aI]nD[C6&m@9m?m!p6=IBt=o_LGHh!q>f$C.jdOXbSP/796HV`_Y]Y
l)M(]FZ9Ld-J_mMRe2q(D>`V@G`NM]crn@_V?sGC@W9^bnrY$.mqeVN^YEcqK)blO~>
endstream
endobj

About the creator:

%PDF-1.4
1 0 obj
>
endobj

I would like to get some suggestions about how to convert this to plain text in PHP, without using the exec/shell_exec functions.

Thank you.

(Other solutions like http://webcheatsheet.com/php/reading_clean_text_from_pdf.php didn't work, and I couldn't get them to at least convert this code to something looking like ASCII-code.)

有帮助吗?

解决方案

You cannot just parse this stream as you need to then decode the data using lots of other data in the file (like font encoding). You really want to use a library to do this...

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top