You should have the source of the data telling you the encoding, but if that cannot happen you either need to reject it or guess the encoding if it's not UTF-8.
For western languages, guessing ISO-8859-1 if it's not UTF-8 is probably going to work most of the time:
ByteBuffer bytes = ByteBuffer.wrap(IOUtils.toByteArray(inputstream));
CharBuffer chars;
try {
try {
chars = Charset.forName("UTF-8").newDecoder().decode(bytes);
} catch (MalformedInputException e) {
throw new RuntimeException(e);
} catch (UnmappableCharacterException e) {
throw new RuntimeException(e);
} catch (CharacterCodingException e) {
throw new RuntimeException(e);
}
} catch (RuntimeException e) {
chars = Charset.forName("ISO-8859-1").newDecoder().decode(bytes);
}
System.out.println(chars.toString());
All this boilerplate is for getting encoding exceptions and being able to read the same data multiple times.
You can also use Mozilla Chardet that uses more sophisticated heuristics to determine the encoding if it's not UTF-8. But it's not perfect, for instance I recall it detecting Finnish text in Windows-1252 as Hebrew Windows-1255.
Also note that arbitrary binary data is valid in ISO-8859-1 so this is why you detect UTF-8 first (It is extremely like that if it passes UTF-8 without exceptions, it is UTF-8) and which is why you cannot try to detect anything else after ISO-8859-1.