While initially these hints were presented as comments to the original question, they now merit to be formulated as an answer:
Code issues
While there is too much code to review and fix without spending a considerable amount of time, and while the original absence of a sample PDF was a hindrance, a quick scan of the code revealed some issues:
The
appendRawCommands(XXXFormStream.createOutputStream(), YYY)
calls quite likely cause problems with PDFBox: creating output streams for the same form more than once may be an issue, and also switching back and forth between the forms.Furthermore there does not seem to be a whitespace between the multiple strings written to the same stream giving rise to unknown Qq operators. Furthermore the
appendRawCommands
method uses UTF-8 which is foreign to PDF.The
generateSignedDocument
most likely does quite a lot of damage as it assumes it can work with PDFs as if they were text files. That in general is not the case.
Result PDF issues
The sample result PDF eventually provided by the OP allows to pinpoint some actually realized issues:
Comparing the bytes of both documents (Report_08_05_23.pdf and Signed_Report_08_05_23.pdf) one finds that there are many unwanted changes, at first glance especially the replacement of certain bytes by question marks. This is due to using
ByteArrayOutputStream.toString()
to easily operate on the document and eventually changing it back into abyte[]
.E.g. cf. the JavaDocs of
ByteArrayOutputStream.toString()
* <p> This method always replaces malformed-input and unmappable-character * sequences with the default replacement string for the platform's * default character set. The {@linkplain java.nio.charset.CharsetDecoder} * class should be used when more control over the decoding process is * required.
Certain byte values do not represent characters in the platform's default character set and therefore are transformed to the Unicode Replacement Character and in the final transformation into a
byte[]
become 0x3f (ASCII code for the question mark). This change kills compressed stream contents, both of content streams and image streams.To fix this, one has to work with
byte
andbyte[]
operations instead ofString
operations here.The stream 8 0 references itself in its XObject resources which might make any pdf viewer throw up. Please refrain from such circularity.
Signature Container issues
The signature does not verify. Thus, it also is reviewed.
Inspecting the signature container one can see that it is wrong: In spite of the signature being adbe.pkcs7.detached, the signature container embeds data. Looking at the code the reason becomes clear:
CMSSignedData sigData = generator.generate(msg, true);
The
true
parameter asks BC to embed themsg
data.Having started to look at the signing code, another issue becomes visible: The
msg
data above are not merely a digest, they already are a signature:Signature signature = Signature.getInstance(algorithm, BC); signature.initSign(privateKey); signature.update(docForSign.getBytes()); CMSTypedData msg = new CMSProcessableByteArray(signature.sign());
which is wrong as the later created SignerInfoGenerator
is used to create the actual signature.
Edit: After the issues mentioned before have been fixed or at least worked-around, the signature is still not accepted by the Adobe Reader. Thus, another look at the code and:
Hash value calculation issue
The OP constructs this ByteRange value
String finalByteRange = "/ByteRange [0 " + offsetContentStart + " " + offsetContentEnd + " " + secondPartLength + "]";
and later sets
String docFirstPart = docString.substring(0, offsetContentStart + 1);
String docSecondPart = docString.substring(offsetContentEnd - 1);
The + 1
and - 1
are intended to make these document parts also include the <
and >
enveloping the signature bytes. But the OP also uses these strings to construct the signed data:
String docForSign = docFirstPart.concat(docSecondPart);
This is wrong, the signed bytes do not contain the <
and >
. Thus, the hash value later on calculated also is wrong and Adobe Reader has good reasons to assume the document has been manipulated.
That been said, there also are other issues bound to come up every once in a while:
Offset and length updating issues
The OP inserts the byte range to be like this:
String interimByteRange = "/ByteRange [0 " + offsetContentStart + " " + offsetContentEnd + " " + secondPartLength + "]";
int byteRangeLengthDifference = interimByteRange.length() - initByteRange.length();
offsetContentStart = offsetContentStart + byteRangeLengthDifference;
offsetContentEnd = offsetContentEnd + byteRangeLengthDifference;
String finalByteRange = "/ByteRange [0 " + offsetContentStart + " " + offsetContentEnd + " " + secondPartLength + "]";
byteRangeLengthDifference += interimByteRange.length() - finalByteRange.length();
//Replace the ByteRange
docString = docString.replace(initByteRange, finalByteRange);
Every one in a while offsetContentStart
or offsetContentEnd
will be slightly below some 10^n and slightly above afterwards. The line
byteRangeLengthDifference += interimByteRange.length() - finalByteRange.length();
tries to make up for this, but finalByteRange
(which eventually is inserted into the document) still contains uncorrected values.
In a similar fashion the representation of the xref start inserted like this
docString = docString.substring(0, startxrefOffset).concat("startxref\n".concat(Integer.toString(xrefOffset))).concat("\n%%EOF\n");
may also be longer than before which makes the byte range (calculated beforehand) not cover the whole document.
Furthermore finding offsets of the relevant PDF objects using text searches of the whole document
offsetContentStart = (documentOutputStream.toString().indexOf("Contents <") + 10 - 1);
offsetContentEnd = (documentOutputStream.toString().indexOf("000000>") + 7);
...
int xrefOffset = docString.indexOf("xref");
...
int startxrefOffset = docString.indexOf("startxref");
will fail for generic documents. E.g. if there already are previous signatures in the document, quite likely the wrong indices will be identified like this.