Net::HTTP.get_response.body 中带有非拉丁（西里尔）符号的 RoR ASCII-8 位到 UTF-8

https://stackoverflow.com//questions/11701062

13-12-2019
|

题

我需要通过 Net::HTTP 获取一些数据，它可以很好地接收 ASCII-8 位的响应。问题是如何将其编码为utf8并保存所有非拉丁符号？

和 @content.encode('utf-8', 'binary', :invalid => :replace, :undef => :replace, :replace => '') 我失去了所有西里尔字母符号

和 @content.encode('utf-8', 'binary') 我明白了 "\xCB" from ASCII-8BIT to UTF-8 错误

和 @content.force_encoding("UTF-8) 我得到 �� 而不是西里尔字母符号

我无法通过谷歌搜索找到答案。

解决方案

问题解决了

begin
    cleaned = response.body.dup.force_encoding('UTF-8')
    unless cleaned.valid_encoding?
       cleaned = response.body.encode( 'UTF-8', 'Windows-1251' )
    end
    content = cleaned
rescue EncodingError
    content.encode!( 'UTF-8', invalid: :replace, undef: :replace )
end

这是更完整的数据

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow