Numbers not showing in generated PDF on some pages after text replace PDFSharp

https://stackoverflow.com/questions/23272586

09-07-2023
|

Question

I'm working on this extremely annoying bug where the number 8 is not showing up in my PDF. 1,2,3,4,5,6,7 and 0 are fine, only 8 is showing a square on SOME of the pages (the portrait ones).

The application works as follows:

Generate PDF on SSRS ( some landscape, some portrait )
Merge PDF using PDFSharp
Update page numbers using PDFSharp

Now, on the first part of the overall report (portrait pages), the 8 is not showing. In the second part of the report, the 8 IS showing.

I don't see any differences in the RDL (language, fonts, even size of the box). The whole document is in ANSI encoding, and so the code to write the new page numbers is the same for the whole document.

8 not showing on first couple of pages

8 is showing on some of the other pages

I saw that Aspose had a problem like this (http://www.aspose.com/community/forums/thread/528718/number-8-missing-in-pdf-file-with-some-viewers.aspx), but I'm not using Aspose.

I checked appending (char) 0x38, and it doesn't show up. 0x37 and 0x39 are. String encoding is done for both cases like Encoding.GetEncoding(1252).GetBytes() or Encoding.Default.

The code for generating the PDFS through SSRS is identical, except for report name of course. I could not find any encoding information in the RDL itself.

Page numbers are replaced, using PDFSharp Stream.Value = 'newvalue'.

All ideas are VERY much appreciated.

UPDATE: I ran the number replacemenent through Aspose and the 8 showed up, as expected, on all pages. Using simple pdf.Pages.Accept(textFragmentAbsorber);.

Update II

So after some playing around, I'm pretty sure it has to do with the way I'm replacing the text in the document, and the encoding of the replaced string.

Extraction is as follows:

    public byte[] UpdatePageNumbers(byte[] file, PageNumberingConfigurationBase config)
    {
        var doc = PdfReader.Open(new MemoryStream(file), PdfDocumentOpenMode.Modify);
        for (int i = 0; i < doc.PageCount; i++)
        {
            var pageNr = i + 1;
            var page = doc.Pages[i];

            for (int j = 0; j < page.Contents.Elements.Count; j++)
            {
                var element = page.Contents.Elements.GetDictionary(j);
                var content = element.AsString();

                if (content.Contains(config.SearchTemplate))
                {
                    var newContent = content.Replace(
                        config.SearchTemplate,
                        config.GetReplacementTextForPage(pageNr, doc.PageCount));

                    element.Stream.Value = newContent.AsByteArray();
                }
            }
        }

        return doc.AsByteArray();
    }

With helper class:

public static class ElementExtensions
{
    public static string AsString(this PdfDictionary dict)
    {
        return GetString(dict.Stream.Value);
    }

    public static byte[] AsByteArray(this string stream)
    {
        return GetBytes(stream);
    }

    static byte[] GetBytes(string str)
    {
        return Encoding.GetEncoding(1252).GetBytes(str);
    }

    static string GetString(byte[] bytes)
    {
        return Encoding.GetEncoding(1252).GetString(bytes);
    }
}

The document encoding inside the PDF is:

/Encoding /WinAnsiEncoding

Here's how the documents are merged:

    public byte[] MergePdf(params byte[][] pdfs)
    {
        var outputDocument = new PdfDocument();

        for (int i = 0; i < pdfs.Count(); i++)
        {
            var document = PdfReader.Open(new MemoryStream(pdfs[i]), PdfDocumentOpenMode.Import);

            // Create the output document
            foreach (PdfPage pdfPage in document.Pages)
            {
                outputDocument.Pages.Add(pdfPage);
            }
        }

        return outputDocument.AsByteArray();
    }

Sample files

So here are the sample files:

This is one report, generated 3 times, then merged, then page numbers updated. https://www.dropbox.com/s/yxzqw0y2tvu3v9a/before_update.pdf https://www.dropbox.com/s/ui26l0qsunhcune/after_update.pdf

Please note that now ALL the numbers are shown as boxes/squares..

Solution

Thanks to @mkl, I found the solution. We're going to add a hidden textbox in the report, with 0123456789 inside. The reason is 'font subsetting' by SSRS.

SSRS will not embed font characters that are not used on the page in order to save space. Therefore, if no '8' was present on the page, the '8' which replaced into the page, was not visible. Therefore, when I created a page with no text on it, I got only squares/boxes.

Thanks again @mkl.

See: http://technet.microsoft.com/en-us/library/ms159713(SQL.100).aspx

When possible, the PDF rendering extension embeds the subset of each font that is needed to display the report in the PDF file. Fonts that are used in the report must be installed on the report server.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow