سؤال

I have a source pdf which I am modifying by adding text objects. I am using "Incremental Updates" which is mentioned in the PDF specification. But while adding text objects using this method I am making some mistakes due to which the pdf doesn't render properly in Adobe Reader 11. When the pdf is opened and I double-click on it, the added text objects get deleted. I figured out that this is due to text annotation.

Now I want to know how a new text object can be added using incremental update? How do the Contents and RC of a free text annotation have to be to maintained?

Also is it possible to disable or delete the annotation so that my problem can be avoided easily? Because I want a simple pdf, I don't want annotation options.

The source pdf I am using is here.

The modified pdf after adding text object is here.

I am not sure that source pdf is itself proper according to pdf specification.

هل كانت مفيدة؟

المحلول

First off let me show you how easy things are if you can use a decent PDF library. I use iTextSharp as an example but the same can also be done with others like PDFBox or PDFNet (already mentioned by @Ika in his answer):

PdfReader reader = new PdfReader(sourcePdf);
using (PdfStamper stamper = new PdfStamper(reader, targetPdfStream)) {
  Font FONT = new Font(Font.FontFamily.HELVETICA, 12, Font.BOLD, new GrayColor(0.75f));
  PdfContentByte canvas = stamper.GetOverContent(1);
  ColumnText.ShowTextAligned(
    canvas,
    Element.ALIGN_LEFT, 
    new Phrase("Hello people!", FONT), 
    36, 540, 0
  );
}

(Derived from the Webified iTextSharp Example StampText.cs explained in chapter 6 of iText in Action — 2nd Edition.)

(Which PDF library you choose, depends on your general requirements and available license models.)

If, in spite of the ease of use of such PDF libraries, you insist on doing it manually, here some remarks:

First you have to find the Page dictionary of the page you want to add content to. Depending on the type of PDF this already might require decompression of object streams etc. but in your sample modified1.pdf that is not necessary:

7 0 obj
  <</Rotate 90
    /Type /Page
    /TrimBox [ 9.54 6.12 585.68 835.88 ]
    /Resources 8 0 R
    /CropBox [ 0 0 595.22 842 ]
    /ArtBox [ 9.54 18.36 585.68 842 ]
    /Contents [ 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R ]
    /Parent 6 0 R
    /MediaBox [ 0 0 595.22 842 ]
    /Annots 17 0 R
    /BleedBox [ 9.54 6.12 585.68 835.88 ]
  >>
endobj 

You see the array of references to content streams. This is where you have to add new page content to. You can manipulate an existing stream or create a new stream and add it to that array.

(Most PDFs have their content stream compressed. For the general case, therefore, you'd have to decompress a stream before you can work on it. Thus, in my eyes, the easier way would be to start a new stream.)

You chose to manipulate the last referenced stream 16 0 which in your PDF is uncompressed:

16 0 obj
<</Length 37 0 R>>
stream
  S 1 0 0 1 13.183 0 cm 0 0 m
  [...]
  0 10 -10 -0 506.238 342.629 Tm
  .13333 .11765 .12157 scn
  -.0002 Tc
  .0006 Tw
  (the Bank and branch on which cheque is drawn\).)Tj

  /F1 2 Tf
  -15.1279 10.9462 Td
  (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789~!@#$%^&*aaaaaaaaaaaaa)Tj

  /F2 1 Tf
  015.1279 01.9462 Td
  (ANAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789)Tj

  ET
endstream
endobj 

Your additions, I gather, are the two 3-liners at the bottom which first select a font, then position the insertion point and finally print a selection of letters.

Now you say you added text abc..z and ABC...Z just for testing. But letters b j k q v etc not appearing in the pdf. The problem becomes even more visible for your second addition of letters; here only the capital 'A' and 'N' are displayed.

The added letter groups

This is due to the fact that the fonts in question are embedded into the PDF --- fonts are embedded into PDFs to allow PDF viewers on systems which don't have the font in question, to display the PDF --- but they are not completely embedded, only the subset of characters required from that font.

Let's look for the font F2 for which only 'N' and 'A' appear:

According to the page object, the page resources can be found in object 8 0:

8 0 obj
  <</Font <</F1 45 0 R /TT2 46 0 R /F2 47 0 R>>
    /ExtGState <</GS2 48 0 R>>
    /ProcSet [ /PDF /Text ]
    /ColorSpace <</Cs6 49 0 R>>
  >>
endobj 

So F2 is defined in 47 0:

47 0 obj
  <</Subtype /Type1
    /Type /Font
    /Widths [ 722 250 250 250 250 250 250 250 250 250 250 250 250 722 ]
    /Encoding 52 0 R
    /FirstChar 65
    /FontDescriptor 53 0 R
    /ToUnicode 54 0 R
    /BaseFont /ILBPOB+TimesNewRomanPSMT-Bold
    /LastChar 78
  >>
endobj 

In the referenced ToUnicode map 54 0 you see

54 0 obj
<</Length 55 0 R>>stream
  /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<
  /Registry (AAAAAA+F2+0) /Ordering (T1UV) /Supplement 0 >> def
  /CMapName /AAAAAA+F2+0 def
  /CMapType 2 def
  1 begincodespacerange <41> <4e> endcodespacerange
  2 beginbfchar
  <41> <0041>
  <4e> <004E>
  endbfchar
  endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj 

In this mapping you see that only character codes 0x41 'A' and 0x4e 'N' are mapped

In your document the font is used only to print "NA" in the amount table cells and for nothing else. Thus, only those two letters 'N' and 'A' are embedded, which results in your addition with that font only outputting these letters.

Thus, to successfully add text to the page you either have to check the font ressources associated with the page for the glyphs they provide (and restrict your additions to those glyphs) or you have to add your own font resource.

As the presence of characters in the encoding often is not as easy to see as it is here (ToUnicode is optional), I would propose, you add your own font ressources. The PDF specification ISO 32000-1 explains how to do that.

Furthermore you state the x and y axis position for the text is not properly displaying in pdf. While you don't say what exactly you mean, you should be aware that in the content stream you can apply affine transformations to the coordinate system of the page, i.e. stretch, skew, rotate, and move the axis.

If you want to use the original coordinate system and not depend on the coordinates to be proper at your additions, you should add an initial content stream to the page containing a q operator (to save the current graphics state on the graphics state stack) and start your additions in a new final content stream with a Q operator (to restore the graphics state by removing the most recently saved state from the stack and making it the current state).

EDIT As a sample I applied the Java equivalent of the C# code at the top to your modified1.pdf with append mode activated. The following objects were changed or added as a result:

The page object 7 0 has been updated:

7 0 obj
  <</CropBox[0 0 595.22 842]
    /Parent 6 0 R
    /Contents[69 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R 14 0 R 15 0 R 16 0 R 70 0 R]
    /Type/Page
    /Resources<<
      /ExtGState<</GS2 48 0 R>>
      /ProcSet [/PDF /Text /ImageB /ImageC /ImageI]
      /ColorSpace<</Cs6 49 0 R>>
      /Font<</F1 45 0 R/F2 47 0 R/TT2 46 0 R/Xi0 68 0 R>>
    >>
    /MediaBox[0 0 595.22 842]
    /TrimBox[9.54 6.12 585.68 835.88]
    /BleedBox[9.54 6.12 585.68 835.88]
    /Annots 17 0 R
    /ArtBox[9.54 18.36 585.68 842]
    /Rotate 90
  >>
endobj 

If you compare with your former version, you see that

  • two new content streams have been added, 69 0 at the start and 70 0 at the end;
  • the resources are not an indirect object anymore but instead are directly included here;
  • the resources contain a new Font ressource Xi0 at 68 0.

Now let's look at the added objects.

This is the font ressource for Helvetica-Bold named Xi0 at 68 0:

68 0 obj
  <</BaseFont/Helvetica-Bold
    /Type/Font
    /Encoding/WinAnsiEncoding
    /Subtype/Type1
  >>
endobj 

Non-embedded, standard 14 font resources are not complicated at all...

Now there are the additional content streams. iText does compress them, but I'll show them in an uncompressed state here:

69 0 obj
<</Length 1>>stream
  q
endstream
endobj
70 0 obj
<</Length 106>>stream 
  Q
  q
  0 1 -1 0 595.22 0 cm
  q
  BT
  1 0 0 1 36 540 Tm
  /Xi0 12 Tf
  0.75 g
  (Hello people!)Tj
  0 g
  ET
  Q
  Q
endstream
endobj 

So the new content stream at the start stores the current graphic state, and the new one at the end retrieves that stored state, changes the coordinate system, positions for text insertion, selects font, font size, and the fill colour, and finally prints a string.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top