Question

I want to make PDFs from external URLs searchable. I'm using pdftotext from XPDF. It's working fine with PDFs already on my webspace, but I keep getting an error message when trying to use external PDFs instead. Specifically I get:

"Error: Couldn't open file 'https://www.vericoa.com/sandbox/test2.pdf' "

Here is my code

$path = 'https://www.vericoa.com/sandbox/test2.pdf'; 

echo shell_exec('pdftotext -enc UTF-8 '.$path.' pdf.txt 2>&1');  

$file = file_get_contents('pdf.txt');

echo $file;

Is it even possible to extract text from external PDF sources? Are there any alternatives (I spent the last hours searching, but found nothing).

Thanks in advance Matthias

Was it helpful?

Solution

You could perhaps try downloading the external URL in php, saving it to a file and passing that to the pdftotext script?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top