I'm working on this extremely annoying bug where the number 8 is not showing up in my PDF. 1,2,3,4,5,6,7 and 0 are fine, only 8 is showing a square on SOME of the pages (the portrait ones).
The application works as follows:
- Generate PDF on SSRS ( some landscape, some portrait )
- Merge PDF using PDFSharp
- Update page numbers using PDFSharp
Now, on the first part of the overall report (portrait pages), the 8 is not showing. In the second part of the report, the 8 IS showing.
I don't see any differences in the RDL (language, fonts, even size of the box). The whole document is in ANSI encoding, and so the code to write the new page numbers is the same for the whole document.
I saw that Aspose had a problem like this (http://www.aspose.com/community/forums/thread/528718/number-8-missing-in-pdf-file-with-some-viewers.aspx), but I'm not using Aspose.
I checked appending (char) 0x38, and it doesn't show up. 0x37 and 0x39 are. String encoding is done for both cases like Encoding.GetEncoding(1252).GetBytes() or Encoding.Default.
The code for generating the PDFS through SSRS is identical, except for report name of course. I could not find any encoding information in the RDL itself.
Page numbers are replaced, using PDFSharp Stream.Value = 'newvalue'.
All ideas are VERY much appreciated.
UPDATE: I ran the number replacemenent through Aspose and the 8 showed up, as expected, on all pages. Using simple pdf.Pages.Accept(textFragmentAbsorber);.
Update II
So after some playing around, I'm pretty sure it has to do with the way I'm replacing the text in the document, and the encoding of the replaced string.
Extraction is as follows:
public byte[] UpdatePageNumbers(byte[] file, PageNumberingConfigurationBase config)
{
var doc = PdfReader.Open(new MemoryStream(file), PdfDocumentOpenMode.Modify);
for (int i = 0; i < doc.PageCount; i++)
{
var pageNr = i + 1;
var page = doc.Pages[i];
for (int j = 0; j < page.Contents.Elements.Count; j++)
{
var element = page.Contents.Elements.GetDictionary(j);
var content = element.AsString();
if (content.Contains(config.SearchTemplate))
{
var newContent = content.Replace(
config.SearchTemplate,
config.GetReplacementTextForPage(pageNr, doc.PageCount));
element.Stream.Value = newContent.AsByteArray();
}
}
}
return doc.AsByteArray();
}
With helper class:
public static class ElementExtensions
{
public static string AsString(this PdfDictionary dict)
{
return GetString(dict.Stream.Value);
}
public static byte[] AsByteArray(this string stream)
{
return GetBytes(stream);
}
static byte[] GetBytes(string str)
{
return Encoding.GetEncoding(1252).GetBytes(str);
}
static string GetString(byte[] bytes)
{
return Encoding.GetEncoding(1252).GetString(bytes);
}
}
The document encoding inside the PDF is:
/Encoding /WinAnsiEncoding
Here's how the documents are merged:
public byte[] MergePdf(params byte[][] pdfs)
{
var outputDocument = new PdfDocument();
for (int i = 0; i < pdfs.Count(); i++)
{
var document = PdfReader.Open(new MemoryStream(pdfs[i]), PdfDocumentOpenMode.Import);
// Create the output document
foreach (PdfPage pdfPage in document.Pages)
{
outputDocument.Pages.Add(pdfPage);
}
}
return outputDocument.AsByteArray();
}
Sample files
So here are the sample files:
This is one report, generated 3 times, then merged, then page numbers updated.
https://www.dropbox.com/s/yxzqw0y2tvu3v9a/before_update.pdf
https://www.dropbox.com/s/ui26l0qsunhcune/after_update.pdf
Please note that now ALL the numbers are shown as boxes/squares..