Question

The value the code below returns is "\x88\x90r\"\x9EN\xFFR":

MyApp::XVP::xvp_password_encrypt_vnc("L1UkDr]c")
# => "\x88\x90r\"\x9EN\xFFR"

When we use this in a test:

should "correctly encrypt a vnc password" do
  assert MyApp::XVP::xvp_password_encrypt_vnc("L1UkDr]c") == "\x88\x90r\"\x9EN\xFFR"
end
# => false

This is an encoding issue, and we can see that by doing the following:

MyApp::XVP::xvp_password_encrypt_vnc("L1UkDr]c").encoding
# => #<Encoding:ASCII-8BIT>

"\x88\x90r\"\x9EN\xFFR".encoding
# => #<Encoding:UTF-8>

So it makes sense that the comparison would fail and the way to fix it is by forcing the encoding to UTF at the end of the xvp_password_encrypt_vnc method like so:

def xvp_password_encrypt_vnc(hex)
  des = OpenSSL::Cipher::Cipher.new("des-ecb")
  ... etc 
  des.update(hex).force_encoding('UTF-8')
end

Now, our failing test passes:

should "correctly encrypt a vnc password" do
  assert MyApp::XVP::xvp_password_encrypt_vnc("L1UkDr]c").force_encoding("UTF-8") == "\x88\x90r\"\x9EN\xFFR"
end
# => true

But things don't seem to work the same way in reverse:

# This should fail
should "correctly encrypt a vnc password" do
  MyApp::XVP::xvp_password_decrypt_vnc("\x88\x90r\"\x9EN\xFFR") == "L1UkDr]c"
end
# => true

The reason the above method should fail is because we are again comparing an ASCII-8bit with a UTF-8 (which failed earlier):

MyApp::XVP::xvp_password_decrypt_vnc("\x88\x90r\"\x9EN\xFFR").encoding
# => #<Encoding:ASCII-8BIT>

"L1UkDr]c".encoding
# => #<Encoding:UTF-8>

How come it fails going one way:

something encoded in ASCII 8-bit != same thing encoded in UTF-8

but it does not fail when we are going the other way:

something encoding in UTF-8 == same thing encoded in ASCII 8-bit
Was it helpful?

Solution

Keep in mind that encodings are for human-computer interactions, ciphers are for computer-computer interaction. When building a cipher, you actually create a bit stream, which has no inherent encoding.

To compensate Ruby's tendency of interpreting strings with encoding, you could transform the values to Base64, like so:

require 'base64'

module MyApp::XVP
  def xvp_password_encrypt_vnc64(hex)
    Base64.strict_encode64 xvp_password_encrypt_vnc(hex)
  end

  def xvp_password_decrypt_vnc64(hex)
    xvp_password_decrypt_vnc Base64.strict_decode64(hex)
  end
end

and perform your tests on the output of these methods.

Another possibility would be to convert your spec data to Encoding::BINARY (which is an alias for Encoding::ASCII_8BIT):

context 'decoding password'
  let(:encoded) { "\x88\x90r\"\x9EN\xFFR".force_encoding('BINARY') }
  let(:decoded) { "L1UkDr]c" }

  subject { MyApp::XVP::xvp_password_decrypt_vnc(encoded) }
  it { should eq decoded }
end

OTHER TIPS

The difference between the two cases is not which “way” you are doing the comparison, but the nature of the strings being compared. The docs aren’t clear on this, but when two strings are compared and they have different encodings Ruby checks to see if they are comparable.

In particular, if a string has ASCII-8BIT encoding and only consists of bytes less than x80 (i.e. only in the ASCII range) then it can be compared to strings in an ASCII compatible encoding such as UTF-8. If it contains bytes outside of the ASCII range (greater than x7f it can’t be compared to a string in another encoding.

In your first case, the string is "\x88\x90r\"\x9EN\xFFR" which contains non-ascii bytes, so it compares as not equal to a string marked as UTF-8, even if the UTF-8 string actually contains the same bytes (note that this is not a valid UTF-8 string in this case). In other words both the following comparisons return false:

u = "\x88\x90r\"\x9EN\xFFR" # default utf-8 encoding
b = "\x88\x90r\"\x9EN\xFFR".force_encoding('ASCII-8BIT') 

# utf-8 == ascii 8bit
puts u == b

# ascii 8bit == utf-8
puts b == u

The second string is "L1UkDr]c", which consists only of bytes in the ASCII range (less than 0x80) and so can be compared to a UTF-8 string. This bit of code produces true for both cases.

u = "L1UkDr]c" # default utf-8 encoding
b = "L1UkDr]c".force_encoding('ASCII-8BIT') 

# utf-8 == ascii 8bit
puts u == b

# ascii 8bit == utf-8
puts b == u

The same (or at least similar) rules are used when combining strings of different encodings. For example in the first case (with non-ascii bytes in the string) trying to do u + b would result in an Encoding::CompatibilityError, in the second case you would just get the string "L1UkDr]cL1UkDr]c".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top