Ruby-FFI (ruby 1.8): Reading UTF-16LE encoded strings
Question
I am working with Ruby-FFI on Ruby 1.8 to wrap a library that uses UTF-16LE strings. The library has a C function that returns such a String.
Whether I wrap the function with
attach_function [:getVersion, [], :pointer]
and call read_string
on the returned pointer, or whether I wrap it with
attach_function [:getVersion, [], :string]
What I get back is only the first character, because the second character is null (\000
) and as a result, FFI stops reading the string there, obviously because it assumes it is dealing with a normal, single-null terminated string.
Is there something I need to do, perhaps in initialization of my Ruby program or FFI or otherwise, to make it know that I expect strings to be UTF-16LE encoded? How else can I get around this?
La solution
OK, this is the (inelegant) workaround I have so far. It involves adding a method to FFI::Pointer. It should be safe to call in the context of my library, because all strings are supposed to be UTF-16LE encoded, but otherwise, it may not be good, because it might never encounter a double null and would just carry on reading past the the bounds of the string in memory.
module FFI
class Pointer
# Read string until we encounter a double-null terminator
def read_string_dn
cont_nullcount = 0
offset = 0
# Determine the offset in memory of the expected double-null
until cont_nullcount == 2
byte = get_bytes(offset,1)
cont_nullcount += 1 if byte == "\000"
cont_nullcount = 0 if byte != "\000"
offset += 1
end
# Return string with calculated length (offset) including terminator
get_bytes(0,offset+1)
end
end
end
Autres conseils
More elegant solution based on the same idea. Does handle the encoding too.
module FFI
class Pointer
def read_wstring
offset = 0
while get_bytes(offset, 2) != "\x00\x00"
offset += 2
end
get_bytes(0, offset).force_encoding('utf-16le').encode('utf-8')
end
end
end