When changing encoding in Maven project (CP-1252 to UTF-8), how shall i handle XML-files with encoding iso-8859-1

StackOverflow https://stackoverflow.com/questions/18758588

I'm working with legacy code at the moment. The project is a big maven-based project and one of the tasks is to change encoding, from cp1252 to utf-8 .ie

<project.build.sourceEncoding>cp1252</project.build.sourceEncoding>

to

 <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

Most of the code are already utf-8, but there are some xml-files in there that are not, (and have their header as <?xml version="1.0" encoding="ISO-8859-1"?>.)

Do I have to manually change all of these files to utf-8 in their headers, or will it work anyway with just changing the setting in Maven to UTF-8? I guess that it would be a lot of possible corrupted characters by doing that?

有帮助吗?

解决方案

It doesn't apply to XML files:

Currently, the character encoding for source files needs to be configured individually for each and every plugin that processes source files. In this context, source file refers to some plain text file that - unlike an XML file - lacks intrinsic means to specify the employed file encoding. The Java source files are the most promiment example of such text files. Velocity templates, BeanShell scripts and APT documents are further examples. This proposal does not apply to XML files as their encoding can be determined from the file itself, see XML encoding for further information.

其他提示

It entirely depend on the data inside the XML element. Blindly changing the encoding may corrupt to one or more data in the xml parsing. It is because, the system get an information from header on whether the data inside the elements is any special character. As an example 1 code which represent a certain letter in German language may represent different character in Japanese.

You may want to process your xml file using an external tool like http://okapi.sourceforge.net/Release/Utilities/Help/encodingconversion.htm; which would help you in changing the encoding of the file.

As a second approach, if you XML files are short and you know the utf-8 code for the specific iso-8859-1 character in the XML, data you could use simple java replace function to process the input file and generate an output file.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top