Question

I have unicode used in my html page, which is displaying correctly in the html page. But while converting it into html using xhtml2pdf, it generating black, solid square boxes in the unicodes. Is there some setting for unicode other than UTF-8 setting. I dont think its unicode problem.

# convert HTML to PDF
pisaStatus = pisa.CreatePDF(
        StringIO(sourceHtml.encode('utf-8')),                 
        dest=resultFile)

Complete py code:

# -*- coding: utf-8 -*-

from xhtml2pdf import pisa
from StringIO import StringIO

source = """<html>
            <style>
                @font-face {
                font-family: Preeti;
                src: url("preeti.ttf");
                }

                body {
                font-family: Preeti;
                }
            </style>
            <body>
                This is a test <br/>
                       सरल
            </body>
        </html>"""

# Utility function
def convertHtmlToPdf(source):
    # open output file for writing (truncated binary)

    pdf = StringIO()
    pisaStatus = pisa.CreatePDF(StringIO(source.encode('utf-8')), pdf)

    # return True on success and False on errors
    print "Success: ", pisaStatus.err
    return pdf

# Main program
if __name__=="__main__":
    print pisa.showLogging()
    pdf = convertHtmlToPdf(source)
    fd = open("test.pdf", "w+b")
    fd.write(pdf.getvalue())
    fd.close()

generated pdf file

Do I even Need to include the font-face ??

Was it helpful?

Solution

Its partially solved. Providing the absolute path to the font i.e.

    <style>
        @font-face {
        font-family: Preeti;
        src: url("c:/static/fonts/preeti.ttf");
        }

        body {
        font-family: Preeti;
        }
    </style>  

Now another problem has raised. I have mixed texts, partially in unicode and partially in normal Font(I think I should say it normal fonts :D), since fonts have been overridden, now the normal Fonts are coming in rectangular boxes. In this case a empty box.

OTHER TIPS

A little late answer but I think that it is important to know why relative paths do not work in fontface for xhtml2pdf:

The CreatePDF function (which is the same with the pisaDocument method as can be seen in https://github.com/chrisglass/xhtml2pdf/blob/master/xhtml2pdf/pisa.py) has a path named parameter. Now, if you don't set this parameter and use a relative path then it will try to find your fonts under a folder named __dummy__ as can be seen on the file https://github.com/chrisglass/xhtml2pdf/blob/master/xhtml2pdf/context.py (search for dummy).

So, that's why your .ttf files only work when you use absolute paths.

To resolve this, you can either:

  • create a __dummy__ folder and put your .ttf files there, or
  • pass a value to the path named parameter of CreatePDF

For example, in my case, I am creating PDFs through django, so I passed path='.' and put my .ttf in the same folder as my manage.py -- everything is working fine. Of a better solution would be to define SETTINGS.PROJECT_PATH and use that.

From the documentation, it looks like you're supposed to give CreatePDF an encoding, otherwise "this is guessed by the HTML5 parser".

So, say the HTML file's headers specify whatever legacy charset was used for Devanagari. You decode that properly to Unicode somewhere before the code you've shown us, then re-encode it as UTF-8, but the headers are specifying a different charset. In that case, html5lib will guess the wrong charset, and interpret the characters incorrectly and give you mojibake.

Of course I can't be sure that's exactly the problem you're facing without a complete example, but it's likely something like that. And the most likely solution is the same for any of them: If you encode to UTF-8, tell the converter to use UTF-8 instead of guessing:

pisaStatus = pisa.CreatePDF(
    StringIO(sourceHtml.encode('utf-8')),                 
    dest=resultFile,
    encoding='utf-8')

I had a black box character in my pdf when converting html to pdf with xhtml2pdf and pisa. Turns out I had a BOM (byte-order mark) character in the document.

The BOM can be removed by doing 'save as' in most text editors. In UltraEdit, I did Save As... and selected type UTF-8 (NO BOM).

See: How do I remove the BOM character from my xml file

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top