I want to make PDFs from external URLs searchable. I'm using pdftotext from XPDF. It's working fine with PDFs already on my webspace, but I keep getting an error message when trying to use external PDFs instead. Specifically I get:

"Error: Couldn't open file 'https://www.vericoa.com/sandbox/test2.pdf' "

Here is my code

$path = 'https://www.vericoa.com/sandbox/test2.pdf'; 

echo shell_exec('pdftotext -enc UTF-8 '.$path.' pdf.txt 2>&1');  

$file = file_get_contents('pdf.txt');

echo $file;

Is it even possible to extract text from external PDF sources? Are there any alternatives (I spent the last hours searching, but found nothing).

Thanks in advance Matthias

有帮助吗?

解决方案

You could perhaps try downloading the external URL in php, saving it to a file and passing that to the pdftotext script?

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top