Ruby 1.9 iso-8859-8-i encoding

https://stackoverflow.com/questions/18427093

26-06-2022
|

Question

I'm trying to create a piece of code that will download a page from the internet and do some manipulation on it. The page is encoded in iso-8859-1.

I can't find a way to handle this file. I need to search through the file in Hebrew and return the changed file to the user.

I tried to use string.encode, but I still get the wrong encoding.

when printing the response encoding, I get: "encoding":{} like its undefined, and this is an example of what it returns:

\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd \ufffd\ufffd-\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd \ufffd\ufffd\ufffd\ufffd       

It should be Hebrew letters.

When I try with final.body.encode('iso-8859-8-i'), I get the error code converter not found (ASCII-8BIT to iso-8859-8-i).

Solution

When you have input where Ruby or OS has incorrectly assign encoding, then conversions will not work. That's because Ruby will start with the wrong assumption and try to maintain the wrong characters when converting.

However, if you know from some other source what the correct encoding is, you can use force_encoding method to tell Ruby how to interpret the bytes it has loaded into a String. Note this alters the object in place.

E.g.

contents = final.body
contents.force_encoding( 'ISO-8859-8' )
puts contents

At this point (provided it works), you now can make conversions (to e.g. UTF-8), because Ruby has been correctly told what characters it is dealing with.

I could not find 'ISO-8859-8-I' on my version of Ruby. I am not sure yet how close 'ISO-8859-8' is to what you need (some Googling suggests that it may be OK for you, if the ...-I encoding is not available).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow