I currently don't have the newest iTextSharp version. I have a itextsharp 5.1.1.0. It does not contain PdfSchemaAdvanced
class, but it has PdfSchema
and its base class XmpSchema
. I bet the PdfSchemaAdvanced
in your lib also derives from XmpSchema
.
The PdfSchema.AddKeyword
only does one thing:
base["pdf:Keywords"] = keywords;
and XmpSchema.[].set
in turn does:
base[key] = XmpSchema.Escape(value);
so it's very clear that the value is being, well, 'Escaped', to ensure that special characters are not interfering with the storage format.
Now, the Escape
function, what what I see, performs simple character-by-character scanning and performs substitutions:
" -> "
& -> &
' -> '
< -> <
> -> >
and that's all. Seems like a typical html-entites processing. At least in my version of the library. So, it would not duplicate the quotes, just change their encoding.
Then, AddRdfDescription
seems to simply iterate over the stored keys and just wraps them in tags with no furhter processing. So, it'd emit something like that:
Escaped"Contents&OfThis"Key
as:
<pdf:Keywords>Escaped"Contents&OfThis"Key</pdf:Keywords>
Aside from the AddKeywords
method, you should also see AddProperty
method. It acts similarly to add-keywords except for the fact that it receives key
and does not Escape() its input value.
So, if you are perfectly sure that your _keywords
are formatted properly, you might try:
AddProperty("pdf:Keywords", _keywords)
but I discourage you from doing that. At least in my version of itextsharp, the library seems to properly process the 'keywords' and format it safely as RDF.
Heh, you may also try using the PdfSchema
class that I just checked instead of the Advanced
one. I bet it still is present in the library.
But, in general, I think the problem lies elsewhere.
Double or triple-check the contents of _keywords variable and then also check the binary contents of the generated PDF. Look into it with some hexeditor or simple plain-text editor like Notepad and look for the <pdf:Keywords>
tag. Check what it actually contains. It might be all OK and it might be your pdf-metadata-reader that adds those quotes.