Question

I have a pdf which contains 'UniCNS-UCS2-H' font, I tried both pdfbox and pdfrenderer, they all throw exception: Unknown encoding for 'UniCNS-UCS2-H'

and this font was included in a font file :mingliu.ttc(it's a true type collection file, I don't know does this matter ?

what can I do to let these two libraries support additional fonts ?

Was it helpful?

Solution

The encoding for a font in PDF documents is specified in the font dictionary object. The font you are encountering is encoded using 'UniCNS-UCS2-H', which as far as I can tell is a variant of Chinese encoding.

PDFBox only supports 4 encodings:

  1. PDFDocEncoding
  2. MacRomanEncoding
  3. StandardEncoding
  4. WinAnsiEncoding

These are defined in the font dictionary object inside the pdf stream
(e.g. .../Encoding/WinAnsiEncoding/...)

When PDFBox encounters an unknown encoding, the exception you reported is shown.

For more information about fonts in PDF documents, see section 9.5 through 9.8 of the PDF Specification

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top