Question

I am using Apatana Studio 3.0. The non-ASCII characters, such as ü, is available in de.yml and oher files. Should they require extra encoding values? I have # encoding: utf-8 present in specific controllers and it works well for all the pages apart from index.html.erb. The index.html.erb raises errors:

Encoding::CompatibilityError in Home#index
    incompatible character encodings: Windows-1252 and UTF-8 

The translation string in de.yml is:

de:
    display_eula: EULA für Applikation

NOTE: The string above is rendered correctly on other pages, it is just not working for index.html.erb.

Was it helpful?

Solution

There's a difference between the character encoding of the source script, and I/O being performed on files read from disk and other sources.

The magic lines like # encoding: utf-8 tell Ruby the source file itself has UTF-8 encoded characters outside the normal ASCII range. That lets Ruby interpret fixed strings and multibyte characters in the source file correctly.

For I/O streams you'll need to tell Ruby how to interpret the incoming/outgoing data. IO.new and its related methods take an optional parameter saying what the incoming/outgoing data stream encoding is.

YAML, JSON, HTML, XML and other file types that are read from disk, socket or pipe are susceptible to encoding problems if they are not pure ASCII. Ruby has a pretty good suite of tools for converting on the fly, or doing it once the string is in memory. If you don't tell Ruby what to expect, or don't convert to what it expects, it'll complain, like you saw.

James Gray has a series of articles talking about dealing with Unicode and multibyte character sets in Ruby. It gets into some deep water because it's not an easy topic but he does a nice job explaining things.

OTHER TIPS

the problem was with the data coming from AWS- SimpleDB (which required to be changed to string.encode("utf-8"). Nonetheless thank you for your effort.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top