Domanda

We've got our servers running on CentOS and our Java backend sometimes has to process a file that was originally generated on a Windows machine (by one of our clients) using CP-1252, however in 95%+ use cases, we are processing UTF-8 files.

My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file? If so:

  • Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?
  • What Java objects would we use to apply the correct encoding on a per file basis?

Thanks in advance!

È stato utile?

Soluzione

My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file?

Assuming you're in charge of the code reading the file, it should be fine. Create a FileInputStream, then wrap it in an InputStreamReader specifying the relevant character encoding.

Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?

That depends on what the JRE supports. I've never used CentOS, so I don't know whether it's likely to come with the relevant encoding as part of the JRE. You can use Charset.isSupported to check though, and Charset.availableCharsets to list what's available.

Altri suggerimenti

All you need to do is specify what charset/encoding the original file was written in while using the XXXReader(InputStream in, Charset cs). For e.g. look at InputStreamReader

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top