Domanda

The case:

  • Server doesn't support exec/shell_exec (so pdftotext is excluded)
  • Other libraries don't accept the PDF. Pdftotext works (tested on the files locally)

Here are some excerpts from the (PDF)code:


5 0 obj
>
stream
Gat$ugPXc?%"6H'p]ofd'_qs00UX27?3p0*8m>KOQL4]:u"*$$^'f*q*SGMee*e$5&=alj\@GV7YPq9pg!Lr0>Y2n'&lmd4Br?V9N
P:_",WI.kJ\#'cs>77M9eTkA;,t#f)aaGuNS-6=Wp*uBg,Ft9Tcj#aI]nD[C6&m@9m?m!p6=IBt=o_LGHh!q>f$C.jdOXbSP/796HV`_Y]Y
l)M(]FZ9Ld-J_mMRe2q(D>`V@G`NM]crn@_V?sGC@W9^bnrY$.mqeVN^YEcqK)blO~>
endstream
endobj

About the creator:

%PDF-1.4
1 0 obj
>
endobj

I would like to get some suggestions about how to convert this to plain text in PHP, without using the exec/shell_exec functions.

Thank you.

(Other solutions like http://webcheatsheet.com/php/reading_clean_text_from_pdf.php didn't work, and I couldn't get them to at least convert this code to something looking like ASCII-code.)

È stato utile?

Soluzione

You cannot just parse this stream as you need to then decode the data using lots of other data in the file (like font encoding). You really want to use a library to do this...

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top