Use a CharsetDecoder
:
final Charset charset = ...
final CharsetDecoder decoder = charset.newDecoder()
.onUnmappableCharacter(CodingErrorAction.REPORT)
.onMalformedInput(CodingErrorAction.REPORT);
I do have this problem in one of my projects, and here is how I deal with it.
Note line 258: if the result is a malformed input sequence then it may be an incomplete read; in that case, I set the last good offset to the last decoded byte, and start again from that offset.
If, on the next read, I fail to read again and the byte offset is the same, then this is a permanent failure (line 215).
Your case is a little different however since you cannot "backtrack"; you'd need to fill a new ByteBuffer
with the rest of the previous buffer and the new one and start from there (allocate for oldBuf.remaining() + bufsize
and .put()
from oldBuf
into the new buffer). In my case, my backend is a file, so I can .map()
from wherever I want.
So, basically:
- if you have an unmappable character, this is a permanent failure (your encoding just cannot handle your byte sequence);
- if you have read the full byte sequence successfully, your
CharBuffer
will havebuf.position()
characters in it; - if you have a malformed input, it may mean that you have an incomplete byte sequence (for instance, using UTF-8, you have one byte out of a three byte sequence), but you need to confirm that with the next iteration.
Feel free to salvage any code you deem necessary! It's free ;)
FINAL NOTE, since I believe this is important: String
's .getBytes(*)
methods and constructors from byte arrays have a default CodingErrorAction
of REPLACE
!