Question

The characters 0x91, 0x92, 0x93, and 0x94 are supposed to represent what in Unicode are U+2018, U+2019, U+201c, and U+201d, or the "opening single quote", "closing single quote", "opening double quote", and "closing double quote". I thought that it was ISO-8859-1 but when I try to process a file using IO.read('file', :encoding=>'ISO-8859-1') it still does not recognize these characters.

If it isn't ISO-8859-1 then what is it? And if it is, why doesn't ruby recognize these characters?

UPDATE: Apparently this encoding is supposed to be Windows-1252. But ruby still does not recognize these characters when I do IO.read('file', :encoding=>'Windows-1252').

UPDATE 2: Nevermind, Windows-1252 works.

Was it helpful?

Solution

0x91 is the Windows-1251 representation of Unicode's \u2018 (AKA ):

>> "\x91".force_encoding('windows-1251').encode('utf-8')
=> "‘"

Windows-1251 and Latin-1 (AKA ISO 8859-1) are not the same, try using windows-1251 as the encoding:

IO.read('file', :encoding => 'windows-1251')

That will give you a string that knows it is Windows-1251. If you want UTF-8, then perhaps you want to specifying the :internal_encoding and :external_encoding:

IO.read('file', :external_encoding => 'windows-1251', :internal_encoding => 'utf-8')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top