PHP: Extract fdf fields as an array from a PDF

https://stackoverflow.com/questions/8824076

15-04-2021
|

Question

i want to extract the available fields as an array from a fillable pdf.

an array like: array('firstname','secondname','address');

i do not need the values for those fields, if they are filled.

what is easiest way to do that using PHP?

Solution

under online documentation for "fdf_next_field_name" the following example is given that you can modify to store the field names into an array

<?php
$fdf = fdf_open($HTTP_FDF_DATA);
for ($field = fdf_next_field_name($fdf); $field != ""; $field = fdf_next_field_name($fdf, $field)) {
    echo "field: $field\n";
}
?>

OTHER TIPS

I upvoted Murray's answer because her was in ernest and I am pretty sure that he is right pre php 5.3

Sadly, pecl fdf is no more.

Thankfully, one "noah" made a comment on the php documentation with a preg_match_all regex solution to the problem. Included here with slight modifications for clarity. Long live noah.

function parse($text_from_file) {
            if (!preg_match_all("/<<\s*\/V([^>]*)>>/x",$text_from_file,$out,PREG_SET_ORDER))
                    return;
            for ($i=0;$i<count($out);$i++) {
                    $pattern = "<<.*/V\s*(.*)\s*/T\s*(.*)\s*>>";
                    $thing = $out[$i][2];
                    if (eregi($pattern,$out[$i][0],$regs)) {
                            $key = $regs[2];
                            $val = $regs[1];
                            $key = preg_replace("/^\s*\(/","",$key);
                            $key = preg_replace("/\)$/","",$key);
                            $key = preg_replace("/\\\/","",$key);
                            $val = preg_replace("/^\s*\(/","",$val);
                            $val = preg_replace("/\)$/","",$val);
                            $matches[$key] = $val;
                    }
            }
            return $matches;
    }

I expect that someone will get fedup with the lack of true fdf support in php and fix this.

Since we are all probably after the same basic workflow if you are reading this question, then you should know that the basic workflow that I am following is:

download normal pdf form.
use libreoffice to make it a pdf form with named fields.
use pdftk to turn it into a fdf file
use this function to figure out what values the form needs.
populate a flat php array with the correct variables defined (from db/whatever)
use pdf_forge to create a new fdf with the values pre-filled
use pdftk again to create a new pdf from fdf+original-pdf with the variables (from db/whatever)
profit

HTH

-FT

If you control the pdf and just want the keys, the following will work. Uses php, no other libraries (good if you host doesn't have them).

Set the pdf submit button to html and set the page to the address where your php code will run.

$q_string  = file_get_contents("php://input");
parse_str($q_string , $pdf_array);
$pdfkeys = array_keys($pdf_array);

The html query string, from the pdf file, is put into the variable $q_string. It is then parsed into an array called $pdf_array. $pdf_array holds all of the keys and values. Then array_keys() is used to put all the keys into $pdfkeys as you wanted.

I had come here looking how to read pdf values to put into a db, and finally after some more poking around came up with the above. Hopefully meets some people's needs. xfdf can also work, but you will need to parse as xml then -- this was simpler for me.

I get a normal post from PDFs submitting to my server, but not in the $_POST array. You just have to parse it from php://input:

$allVars = file_get_contents("php://input");

parse_str($allVars, $myPost);

foreach($myPost as $key => $value) {
 $allKeys[] = $key;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow