Reading chars from a stream of ByteArrays where boundary alignment may be imperfect

Question

Use a CharsetDecoder:

final Charset charset = ...
final CharsetDecoder decoder = charset.newDecoder()
    .onUnmappableCharacter(CodingErrorAction.REPORT)
    .onMalformedInput(CodingErrorAction.REPORT);

I do have this problem in one of my projects, and here is how I deal with it.

Note line 258: if the result is a malformed input sequence then it may be an incomplete read; in that case, I set the last good offset to the last decoded byte, and start again from that offset.

If, on the next read, I fail to read again and the byte offset is the same, then this is a permanent failure (line 215).

Your case is a little different however since you cannot "backtrack"; you'd need to fill a new ByteBuffer with the rest of the previous buffer and the new one and start from there (allocate for oldBuf.remaining() + bufsize and .put() from oldBuf into the new buffer). In my case, my backend is a file, so I can .map() from wherever I want.

So, basically:

if you have an unmappable character, this is a permanent failure (your encoding just cannot handle your byte sequence);
if you have read the full byte sequence successfully, your CharBuffer will have buf.position() characters in it;
if you have a malformed input, it may mean that you have an incomplete byte sequence (for instance, using UTF-8, you have one byte out of a three byte sequence), but you need to confirm that with the next iteration.

Feel free to salvage any code you deem necessary! It's free ;)

FINAL NOTE, since I believe this is important: String's .getBytes(*) methods and constructors from byte arrays have a default CodingErrorAction of REPLACE!