I'm at a loss as to how I could build a loop to pdftotext and entire directory through a shell_exec() statement.

Something like :

$pdfs = glob("*.pdf");

foreach($pdfs as $pdfs) {
    shell_exec('pdftotext '.$pdfs.' '.$pdfs'.txt');
}

But I'm unsure how I can drop the .pdf extension the 2nd time I call $pdfs in my shell_exec() statement and replace that with .txt

Not really sure this loop is correct either....

有帮助吗?

解决方案

Try

foreach(glob("*.pdf") as $src) {

  // Manually remove file extension because glob() may return a dir path component
  $parts = explode('.', $src);
  $parts[count($parts) - 1] = 'txt';
  $dest = implode('.', $parts);

  // Escape shell arguments, just in case
  shell_exec('pdftotext '.escapeshellarg($src).' '.escapeshellarg($dest));

}

Basically, loop the PDF files in the directory and execute the command for each one, using just the name component of the file name (extracted with pathinfo())see edit for the output file (so test.pdf becomes test.txt).

Using the result of glob() directly in foreach easily avoids the variable naming collision you had in the code above.

EDIT

I have change the above code to manually remove the file extension when generating the output file name. This is because glob() may return a directory component of the path strings, as well as just a file name. Using pathinfo() or basename() will strip this off, and since we know that a . will be present in the file name (the rule passed to glob() dictates this) we can safely remove everything after the last one. I have also added escapeshellarg() for good measure - it is highly unlikely (if not impossible) that a file name that already exists would fall foul of this, but it is best to be safe.

其他提示

$pdfs = glob("*.pdf");

$fmt='/path/to/pdftotext "%s" "%s.txt"';

foreach($pdfs as $thispdf) {
    shell_exec(sprintf($fmt, $thispdf, basename($thispdf, ".pdf")));
}
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top