Question

I have tried iText, PDFBox & Oracle Forms. And I also succed in case of iText to generate Gujarati PDF Document. But, unfortunately it is not generating proper Font in Gujarati (UTF-8) language.

I have my project in jdk 1.4 & that is mandatory to use. So, I need older version of API that support Gujarati Font.

Please suggest if any option is available.

Sample Code:

public void GeneratePDFusingiText(String lStrGujaratidata)
  {
    try
    {

      BaseFont bf = BaseFont.createFont("C:\\Windows\\Fonts\\Shruti.ttf",  BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
      Font font = new Font(bf, 12);
      Document document = new Document();
      PdfWriter.getInstance(document, new FileOutputStream("D:/GeneratePDFusingiText.pdf"));
      document.open();
      document.add(new Paragraph(lStrGujaratidata, font));
      document.close();
    }
    catch(Exception e)
    {
      System.out.println("Exception while generating PDF");
      e.printStackTrace();
    }
   } 

EDIT 1:

Perhaps the image is not getting displayed. It is uploaded here.

EDIT 2:

image of font examples

Step-1) I type a gujarati String Google Transliterate.

Step-2) I convert it into unicode using BableMap Software to use it using Resourse Bundle.

Issue: Let me have a String: બિલાડી (Biladi)

It's unicode will be : \u0AAC \u0ABF\u0AB2\u0ABE\u0AA1\u0AC0

Check the Bold Unicode character above. That is where I am getting the problem. Now if I change this unicode to \u0ABF\u0AAC\u0AB2\u0ABE\u0AA1\u0AC0 , it prints proper output in PDF.

At the same time it prints wrong output in HTML i.e. : િબલાડી

I have to manage in between them.

I have tried using "gu" & "gu.UTF-8" & "UTF-8". But, everytime I am getting same output.

Was it helpful?

Solution

Updated Answer

After your comment I realised that I was wrong, i.e. the diacritic character should appear second in the byte sequence, even though it should be rendered left of the main character.

So, it turns out, iText doesn't support this type of rendering on Indic charactersets. Roughly speaking, iText uses awt's Graphics2D to render non-Latin unicode characters, one-by-one, as images in the PDF. (I guess this is because appropriate fonts are not necessarily be installed on everyone's computer). This feature doesn't take this special ordering into account.

iText does support similar behaviour for Arabic, using a class contributed by another developer. See com.itextpdf.text.pdf.ArabicLigaturizer. Perhaps you could create a similar one yourself? (!)

It looks like this has come up before:

Original Answer

Kem chho,

I believe that iText is displaying the correct characters, but the first 2 characters of your input have been 'flipped' before you translated the string into unicode points. So, the problem occurred before the data even gets to iText.

The underlying issue is that the 'first' character is a 'pre-base' character, which is a type of Diacritic. It's a bit like an 'accent' in European texts, in that it can't exist on its own, and its purpose is to embellish another character. In this case it turns a 'Ba' (બ) into a 'Bi'.

You'll see int the the Unicode Codepage, that the first character (િ) is indeed codepoint \u0ABF, and the second (બ) is \u0AAC : http://en.wikipedia.org/wiki/Gujar%C4%81ti_script#Unicode

So, somewhere between Google Transliterate and your codepoint representation, these characters got flipped. So, you need to review how you did that translation.

How did you convert these characters into codepoints?

Seemingly, some interpreters place the 'pre-base' after the main consonant, instead of before it:

  • Note that when you paste those characters into a (Linux) terminal, the first 2 characters come out back-to-front. I believe something similar happened for you too.
  • You'll also notice that when you try editing this word in Google Transliterate, you can't place the cursor between the first 2 characters, and when you hit backspace, the left character is deleted before the right.

So, if you can work out where this 'flipping' occured, then hopefully your solution will present itself.

Hope this helps

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top