Peculiar encoding issues in Ruby: ASCII != UTF-8 but UTF-8 == ASCII

Question 1

Keep in mind that encodings are for human-computer interactions, ciphers are for computer-computer interaction. When building a cipher, you actually create a bit stream, which has no inherent encoding.

To compensate Ruby's tendency of interpreting strings with encoding, you could transform the values to Base64, like so:

require 'base64'

module MyApp::XVP
  def xvp_password_encrypt_vnc64(hex)
    Base64.strict_encode64 xvp_password_encrypt_vnc(hex)
  end

  def xvp_password_decrypt_vnc64(hex)
    xvp_password_decrypt_vnc Base64.strict_decode64(hex)
  end
end

and perform your tests on the output of these methods.

Another possibility would be to convert your spec data to Encoding::BINARY (which is an alias for Encoding::ASCII_8BIT):

context 'decoding password'
  let(:encoded) { "\x88\x90r\"\x9EN\xFFR".force_encoding('BINARY') }
  let(:decoded) { "L1UkDr]c" }

  subject { MyApp::XVP::xvp_password_decrypt_vnc(encoded) }
  it { should eq decoded }
end

Question 2

The difference between the two cases is not which “way” you are doing the comparison, but the nature of the strings being compared. The docs aren’t clear on this, but when two strings are compared and they have different encodings Ruby checks to see if they are comparable.

In particular, if a string has ASCII-8BIT encoding and only consists of bytes less than x80 (i.e. only in the ASCII range) then it can be compared to strings in an ASCII compatible encoding such as UTF-8. If it contains bytes outside of the ASCII range (greater than x7f it can’t be compared to a string in another encoding.

In your first case, the string is "\x88\x90r\"\x9EN\xFFR" which contains non-ascii bytes, so it compares as not equal to a string marked as UTF-8, even if the UTF-8 string actually contains the same bytes (note that this is not a valid UTF-8 string in this case). In other words both the following comparisons return false:

u = "\x88\x90r\"\x9EN\xFFR" # default utf-8 encoding
b = "\x88\x90r\"\x9EN\xFFR".force_encoding('ASCII-8BIT') 

# utf-8 == ascii 8bit
puts u == b

# ascii 8bit == utf-8
puts b == u

The second string is "L1UkDr]c", which consists only of bytes in the ASCII range (less than 0x80) and so can be compared to a UTF-8 string. This bit of code produces true for both cases.

u = "L1UkDr]c" # default utf-8 encoding
b = "L1UkDr]c".force_encoding('ASCII-8BIT') 

# utf-8 == ascii 8bit
puts u == b

# ascii 8bit == utf-8
puts b == u

The same (or at least similar) rules are used when combining strings of different encodings. For example in the first case (with non-ascii bytes in the string) trying to do u + b would result in an Encoding::CompatibilityError, in the second case you would just get the string "L1UkDr]cL1UkDr]c".