How can I convert a downloaded csv file (ANSI) to UTF-8 in Android

https://stackoverflow.com/questions/22595556

19-06-2023
|

Pergunta

My program download a CSV file, split it, and use it to build a listView, but some character wrong. I checked the CSV file in notepad++ and I've seen the character encoding is ANSI. How can I convert it to UTF-8.

@Override
protected List<Teendo> doInBackground(String... params) {
try {

        URL url = new URL("http://www.programozas-oktatas.hu/androidvizsga/todo.csv");
        InputStream is = url.openStream();
        InputStreamReader isr = new InputStreamReader(is);
        BufferedReader br = new BufferedReader(isr);

        String sor = br.readLine();

        while ((sor = br.readLine()) != null) {
            String [] darabok = sor.split(";");
            if (darabok.length > 1) {
                String megnevezes = darabok[0];
                String [] datumdarabok = darabok[1].split("-");
                int ev = Integer.parseInt(datumdarabok[0]);
                int ho = Integer.parseInt(datumdarabok[1]);
                int nap = Integer.parseInt(datumdarabok[2]);
                int fontossag =  Integer.parseInt(darabok[2]);
                Teendo teendo = new Teendo (megnevezes,ev,ho,nap,fontossag);
            teendoList.add(teendo);
            }
        }
    } catch (MalformedURLException e) {
        Log.w("DOWNLOAD", e.getMessage());
    } catch (IOException e) {
        Log.w("DOWNLOAD", e.getMessage());
    }           
    return teendoList;
}

Solução

"ANSI" is a vague, misleading term that should be avoided.

In this case, if the file is Hungarian, use an encoding that supports those characters: ISO-8859-2 or Windows-1250 -- not ISO-8859-1. For example, the very first line contains either:

"Határidõ"   // lowercase-O with tilde, ISO-8859-1
"Határidő"   // lowercase-O with double-acute, ISO-8859-2

The Windows charsets have additional printable characters in place of control characters in the "equivalent" ISO charsets. But unlike the situation with ISO-8859-1, where Windows-1252 has all of 8859-1's printable characters in the same place, Windows-1250 has some printable characters in different places. Considering all these factors, ideally you can figure out which encoding is actually being used. For example, if the data uses the euro, which is only in Windows-1250, you can specify that when instantiating the InputStreamReader:

InputStreamReader isr = new InputStreamReader(is, "Windows-1250");

Outras dicas

Tell your InputStreamReader to use the correct encoding:

InputStreamReader isr = new InputStreamReader(is, "ISO-8859-1");

This will cause the file to be read as ISO-8859-1 (ANSI) instead of the system default encoding.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow