質問

The case:

  • Server doesn't support exec/shell_exec (so pdftotext is excluded)
  • Other libraries don't accept the PDF. Pdftotext works (tested on the files locally)

Here are some excerpts from the (PDF)code:


5 0 obj
>
stream
Gat$ugPXc?%"6H'p]ofd'_qs00UX27?3p0*8m>KOQL4]:u"*$$^'f*q*SGMee*e$5&=alj\@GV7YPq9pg!Lr0>Y2n'&lmd4Br?V9N
P:_",WI.kJ\#'cs>77M9eTkA;,t#f)aaGuNS-6=Wp*uBg,Ft9Tcj#aI]nD[C6&m@9m?m!p6=IBt=o_LGHh!q>f$C.jdOXbSP/796HV`_Y]Y
l)M(]FZ9Ld-J_mMRe2q(D>`V@G`NM]crn@_V?sGC@W9^bnrY$.mqeVN^YEcqK)blO~>
endstream
endobj

About the creator:

%PDF-1.4
1 0 obj
>
endobj

I would like to get some suggestions about how to convert this to plain text in PHP, without using the exec/shell_exec functions.

Thank you.

(Other solutions like http://webcheatsheet.com/php/reading_clean_text_from_pdf.php didn't work, and I couldn't get them to at least convert this code to something looking like ASCII-code.)

役に立ちましたか?

解決

You cannot just parse this stream as you need to then decode the data using lots of other data in the file (like font encoding). You really want to use a library to do this...

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top