Question

I have an external Hive table which is filled with a folder of txt files. There are some special characters in there like ô, é, à, €,... When I query this table with Hive, these characters get shown as a black square with a white question mark in it.

However, I also have a Hive table imported from mysql using sqoop containing these characters. When I query this table the characters do get shown in a normal fashion.

When I cat the files out of hdfs the characters also show in the way they are supposed to be in the terminal. Do I have to set character encoding in a specific way for the not-working Hive table? If so, how?

Was it helpful?

Solution

I solved the problem by converting them from windows-1252 encoding to utf-8 using iconv before putting the files into HDFS.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top