PHP run linux “less” command via exec - binary file warning
Question
I have to convert some PDF files to TXT. I end up with "less" command, because for example pdftotext has some problems with tables in PDF. The problem is that when I ran the command from exec function (or shell_exec/system), less just showing me information, that selected PDF is binary file and result file is just TXT with PDF data in it. But when I do the same thing normally in terminal, everything is ok. I also tried to login as www_data user and ran command as this user, but there is also no problem.
Command:
$ less /var/www/original.pdf > /var/www/new.txt
PHP code:
exec("less -f /var/www/original.pdf > /var/www/new.txt 2>&1");
Result from PHP exec:
"/var/www/original.pdf" may be a binary file. See it anyway?
The "-f" option in exec command is there because then you don't need to press "y" for "yes, I want to see it anyway."
set | grep less
yields:
LESSCLOSE='/usr/bin/lesspipe %s %s'
LESSOPEN='| /usr/bin/lesspipe %s'
Lossless LZW RLE Zip' -- "$cur" ));
_apport_parameterless
_apport_parameterless
_apport_parameterless
_apport_parameterless
_apport_parameterless ()
Solution
From what I read, your console is able to display a PDF file with less
because you have an input preprocessor installed, like lesspipe
or lessfile
. The way to make less
use those preprocessor is by reading an environment variable called LESSOPEN, which points to the lesspipe
and lessfile
script.
There might be a way your webserver, through environment variables and shell commands, might be able to replicate this behavior so that your calls to less
parse PDFs properly.
What I would suggest would be to call a bash script to do the conversion for you instead of calling less
directly. That way, your bash script would be able to set the appropriate environment variables and execute the appropriate commands to convert your PDF files to a readable output.
Here's an example of how to do this:
#!/bin/bash
eval $(lesspipe)
less $1 > $2 2>&1
Then, from PHP, call that script like this:
exec("/path/to/your/script/script.sh /var/www/original.pdf /var/www/new.txt");
If it doesn't work, try changing eval $(lesspipe)
to eval $(lessfile)
.
OTHER TIPS
First of all, less is an interactive program to read text streams. In this context you should use cat
instead. This or course won't work either since PDF is a binary format as opposed to text based.
Why don't you use a pdf to text converter like pdftotext
?
How was the PHP code executed? On the command line, via php file.php
or by a Web server when you hit it with a browser http://servername/something/file.php
?
One guess is that the less
you execute when doing it on the command line is not the same less
as when the PHP code is run.