Question

I'm storing strings in TIFF headers using JAI. Some strings contain characters which value is greater than 127d (e.g. 'é' is 233d).

When I open the resulting TIFF file with an hex editor, I can see the byte 233d, but when I try to read it back through JAI by the TIFFField.getAsString(), I'm getting '?' (Unicode 0xfffd "replacement char"). I have checked the TIFF 6.0 specifications but they just mention "7-bits ASCII".

I would like to tell JAI to use ISO-8859-1 Charset to decode strings. Is that possible? I haven't find anything in the (old) javadoc. As a last resort, I could also use URL-encoding for strings but would rather avoid that.

Was it helpful?

Solution

A TIFF tag defined as ASCII, is by the specification only allowed to contain plain 7 bit ASCII.

Unfortunately, this isn't very useful in the real world (where not all of us speak English), so a lot of software will write UTF8 or even a ISO-8859-x encoded strings into these fields, even if it's in violation of the spec. There is no other encoding allowed in an ASCII tag.

JAI, being very strict in reading, probably decodes the string as plain ASCII, and as the 'é' isn't part of that charset it replaces it with a "unicode replacement char".

Your best bet, is to do one of the following:

  • If allowed by the tag, use BYTE or UNDEFINED instead of ASCII + encoding specification
  • If possible, use a different tag to write your value (that allows BYTE or UNDEFINED values + encoding specification)
  • If neither of the above is possible, your best bet is to get to the actual bytes and decode yourself, or use a different library to parse the TIFF structure
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top