Question

Is there way to check is text file (.txt) encoded with Unicode or UTF-8 with Java?

Was it helpful?

Solution

You cannot know with absolute certainty which charset is used in the general case. I found this to be a good read.

http://illegalargumentexception.blogspot.co.uk/2009/05/java-rough-guide-to-character-encoding.html

Especially the section Automatic detection of encoding.

OTHER TIPS

Uhm, theoretically, how would you know if it is unicode?

This is the real question. Truthfully, you cannot know, but you can make a decent guess.

See: Java : How to determine the correct charset encoding of a stream for more details. :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top