python 3 fdfgen unicode TypeError

https://stackoverflow.com/questions/16740881

30-05-2022
|

Question

I am using Python 3.3 and am trying to make use of the wonderful fgfgen / forge_fdf script (thanks guys, btw).

When I attempt to run a sample test of fdfgen, I return the following error.

        safe = utf16.replace('\x00)', '\x00\\)').replace('\x00(', '\x00\\(')
TypeError: expected bytes, bytearray or buffer compatible object

After some looking around, this seems to be a result of python 3 handling unicode encoding? but I am unsure. Here is a sample of the fdfgen code executed followed by the fdfgen code so nicely provided. Thanks in advance:

>>> from fdfgen import forge_fdf
>>> fields = [('last_name', u'Spencer')]
>>> fdf = forge_fdf('SMBRPython.pdf', fields, [], [], [])

    # -*- coding: utf-8 -*-
"""
Port of the PHP forge_fdf library by Sid Steward
(http://www.pdfhacks.com/forge_fdf/)

Anders Pearson <anders@columbia.edu> at Columbia Center For New Media Teaching
and Learning <http://ccnmtl.columbia.edu/>
"""

__author__ = "Anders Pearson <anders@columbia.edu>"
__credits__ = ("Sébastien Fievet <zyegfryed@gmail.com>,"
               "Brandon Rhodes <brandon@rhodesmill.org>")

import codecs

def smart_encode_str(s):
    """Create a UTF-16 encoded PDF string literal for `s`."""
    utf16 = s.encode('utf_16_be')
    safe = utf16.replace('\x00)', '\x00\\)').replace('\x00(', '\x00\\(')
    return ('%s%s' % (codecs.BOM_UTF16_BE, safe))


def handle_hidden(key, fields_hidden):
    if key in fields_hidden:
        return "/SetF 2"
    else:
        return "/ClrF 2"


def handle_readonly(key, fields_readonly):
    if key in fields_readonly:
        return "/SetFf 1"
    else:
        return "/ClrFf 1"


def handle_data_strings(fdf_data_strings, fields_hidden, fields_readonly):
    for (key, value) in fdf_data_strings:
        if type(value) is bool:
            if value:
                yield "<<\n/V/Yes\n/T (%s)\n%s\n%s\n>>\n" % (
                    smart_encode_str(key),
                    handle_hidden(key, fields_hidden),
                    handle_readonly(key, fields_readonly),
                )
            else:
                yield "<<\n/V/Off\n/T (%s)\n%s\n%s\n>>\n" % (
                    smart_encode_str(key),
                    handle_hidden(key, fields_hidden),
                    handle_readonly(key, fields_readonly),
                )
        else:
            yield "<<\n/V (%s)\n/T (%s)\n%s\n%s\n>>\n" % (
                smart_encode_str(value),
                smart_encode_str(key),
                handle_hidden(key, fields_hidden),
                handle_readonly(key, fields_readonly),
            )


def handle_data_names(fdf_data_names, fields_hidden, fields_readonly):
    for (key, value) in fdf_data_names:
        yield "<<\n/V /%s\n/T (%s)\n%s\n%s\n>>\n" % (
            smart_encode_str(value),
            smart_encode_str(key),
            handle_hidden(key, fields_hidden),
            handle_readonly(key, fields_readonly),
        )


def forge_fdf(pdf_form_url="", fdf_data_strings=[], fdf_data_names=[], fields_hidden=[], fields_readonly=[]):

    """Generates fdf string from fields specified

    pdf_form_url is just the url for the form fdf_data_strings and
    fdf_data_names are arrays of (key,value) tuples for the form fields. FDF
    just requires that string type fields be treated seperately from boolean
    checkboxes, radio buttons etc. so strings go into fdf_data_strings, and
    all the other fields go in fdf_data_names. fields_hidden is a list of
    field names that should be hidden fields_readonly is a list of field names
    that should be readonly

    The result is a string suitable for writing to a .fdf file.

    """
    fdf = ['%FDF-1.2\n%\xe2\xe3\xcf\xd3\r\n']
    fdf.append("1 0 obj\n<<\n/FDF\n")

    fdf.append("<<\n/Fields [\n")
    fdf.append(''.join(handle_data_strings(fdf_data_strings, fields_hidden, fields_readonly)))
    fdf.append(''.join(handle_data_names(fdf_data_names, fields_hidden, fields_readonly)))
    fdf.append("]\n")

    if pdf_form_url:
        fdf.append("/F (" + smart_encode_str(pdf_form_url) + ")\n")

    fdf.append(">>\n")
    fdf.append(">>\nendobj\n")
    fdf.append("trailer\n\n<<\n/Root 1 0 R\n>>\n")
    fdf.append('%%EOF\n\x0a')

    return ''.join(fdf)

Solution

Encoding produces byte values, but you are using string values to try to replace things. Use Byte literals instead:

safe = utf16.replace(b'\x00)', b'\x00\\)').replace(b'\x00(', b'\x00\\(')
return (b'%s%s' % (codecs.BOM_UTF16_BE, safe))

OTHER TIPS

Fdfgen has now been ported to Python 3, mostly just by explicitly turning all of the strings into byte literals, as Martjin Pieters mentioned.

On fdfgen 0.11.0 with python 3.4.2 on Vista, when writing the fdf data to a file, I got this error initially:

    fdf_file.write(fdf)
TypeError: must be str, not bytes

and I also got this initially but can't reproduce it now:

can't convert 'bytes' object to str implicitly

Turns out the only change I had to make was to add the 'binary' mode to the file open command. Instead of:

fdf_file=open("testForm.fdf","w")

use this:

fdf_file=open("testForm.fdf","wb")

all the other lines were the same as in his example on the fdfgen web site. Hope this helps someone out.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow