I have an external Hive table which is filled with a folder of txt files. There are some special characters in there like ô, é, à, €,... When I query this table with Hive, these characters get shown as a black square with a white question mark in it.

However, I also have a Hive table imported from mysql using sqoop containing these characters. When I query this table the characters do get shown in a normal fashion.

When I cat the files out of hdfs the characters also show in the way they are supposed to be in the terminal. Do I have to set character encoding in a specific way for the not-working Hive table? If so, how?

有帮助吗?

解决方案

I solved the problem by converting them from windows-1252 encoding to utf-8 using iconv before putting the files into HDFS.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top