Domanda

I want to make PDFs from external URLs searchable. I'm using pdftotext from XPDF. It's working fine with PDFs already on my webspace, but I keep getting an error message when trying to use external PDFs instead. Specifically I get:

"Error: Couldn't open file 'https://www.vericoa.com/sandbox/test2.pdf' "

Here is my code

$path = 'https://www.vericoa.com/sandbox/test2.pdf'; 

echo shell_exec('pdftotext -enc UTF-8 '.$path.' pdf.txt 2>&1');  

$file = file_get_contents('pdf.txt');

echo $file;

Is it even possible to extract text from external PDF sources? Are there any alternatives (I spent the last hours searching, but found nothing).

Thanks in advance Matthias

È stato utile?

Soluzione

You could perhaps try downloading the external URL in php, saving it to a file and passing that to the pdftotext script?

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top