RoR ASCII-8bit to UTF-8 with non-latin (cyrillic) symbols in Net::HTTP.get_response.body

StackOverflow https://stackoverflow.com//questions/11701062

  •  13-12-2019
  •  | 
  •  

Question

I need to get some data via Net::HTTP, it works good by i recieve response in ASCII-8bit. The problem is how to encode this to utf8 and save all non-latin symbols?

With @content.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace, :replace => '') i loose all cyrillic symbols

With @content.encode('utf-8', 'binary') i get "\xCB" from ASCII-8BIT to UTF-8 error

With @content.force_encoding("UTF-8) i get ������ instead of cyrillic symbols

I can't find answer with google search.

Was it helpful?

Solution

Problem is solved with

begin
    cleaned = response.body.dup.force_encoding('UTF-8')
    unless cleaned.valid_encoding?
       cleaned = response.body.encode( 'UTF-8', 'Windows-1251' )
    end
    content = cleaned
rescue EncodingError
    content.encode!( 'UTF-8', invalid: :replace, undef: :replace )
end

here is more complete data

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top