How to add a UTF-8 BOM in java
-
10-10-2019 - |
سؤال
I have a Java stored procedure which fetches record from the table using Resultset object and creates a csv file.
BLOB retBLOB = BLOB.createTemporary(conn, true, BLOB.DURATION_SESSION);
retBLOB.open(BLOB.MODE_READWRITE);
OutputStream bOut = retBLOB.setBinaryStream(0L);
ZipOutputStream zipOut = new ZipOutputStream(bOut);
PrintStream out = new PrintStream(zipOut,false,"UTF-8");
out.write('\ufeff');
out.flush();
zipOut.putNextEntry(new ZipEntry("filename.csv"));
while (rs.next()){
out.print("\"" + rs.getString(i) + "\"");
out.print(",");
}
out.flush();
zipOut.closeEntry();
zipOut.close();
retBLOB.close();
return retBLOB;
But the generated csv file doesn't show the correct german character. Oracle database also has a NLS_CHARACTERSET value of UTF8.
Please suggest.
المحلول
To write a BOM in UTF-8 you need PrintStream.print()
, not PrintStream.write()
.
Also if you want to have BOM in your csv
file, I guess you need to print a BOM after putNextEntry()
.
نصائح أخرى
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(...), StandardCharsets.UTF_8));
out.write('\ufeff');
out.write(...);
This correctly writes out 0xEF 0xBB 0xBF to the file, which is the UTF-8 representation of the BOM.
I think that out.write('\ufeff');
should actually be out.print('\ufeff');
.
According the javadoc, the write(int)
method actually writes a byte ... without any character encoding. So out.write('\ufeff');
writes the byte 0xff
. By contrast, the print(char)
method encodes the character as one or bytes using the stream's encoding, and then writes those bytes.
Just in case people are using PrintStream
s, you need to do it a little differently. While a Writer
will do some magic to convert a single byte into 3 bytes, a PrintStream
requires all 3 bytes of the UTF-8 BOM individually:
// Print utf-8 BOM
PrintStream out = System.out;
out.write('\ufeef'); // emits 0xef
out.write('\ufebb'); // emits 0xbb
out.write('\ufebf'); // emits 0xbf
Alternatively, you can use the hex values for those directly:
PrintStream out = System.out;
out.write(0xef); // emits 0xef
out.write(0xbb); // emits 0xbb
out.write(0xbf); // emits 0xbf
In my case it works with the code:
PrintWriter out = new PrintWriter(new File(filePath), "UTF-8");
out.write(csvContent);
out.flush();
out.close();