PDF Box - encountering strange text in COSString

Question

read/replace PDF text using standard documented way i.e. through COSString (Tj and TJ operators)

This "documented way" unfortunately is very misleading for two reasons:

It assumes that the string parameters of Tj and TJ are encoded in some standard encoding. Actually the encoding is governed By The font and may be a completely custom-made one. Depending on the font type, the encoding may even be a multibyte encoding.
It assumes letters and whole words come in the same order, unbroken, as you read them. This also need not be the case.

PDF simply is not a format designed for editing content. It can be done pretty easily, though, in simply designed ones, in general, though, it is really difficult.

PS: The strange output from your sample document is due to the use of a composite font using Identity-H encoding which embeds a subset of TimesNewRoman.

That font does contain a ToUnicode mapping; thus, translating what you read to character data is possible.

Replacing that text could be a problem , though, because only a subset is embedded; e.g. the capital letters 'I' and 'J' are not embedded and cannot be used in a replacement unless you either use a different font or possibly even add to the partial fonts. Neither of these operations is as simple as your original code.

And this is not the worst imaginable scenario, sometimes there is no information on how to interpret the raw data in the string as text, the PDF only knows how to draw the glyphs.