Question

I ran into some trouble while creating a C-Extension for ruby that got me thinking. I wonder how Ruby (1.9.1) handles strings (and all the encoding-stuff) internally?

If I have a string like "o", and I pass the string to a C-Function (as VALUE), I can deal with it pretty easily using the RSTRING_PTR() and the RSTRING_LEN() macro. However, if I make the string ö (a german umlaut character), RSTRING_LEN() will give me 2.

I'm a bit stumped on the contents of RSTRING_PTR() in that case, the two bytes are 0xA4 and 0xC3. What encoding is this? I tried using "ö".force_encoding( ... ) with different encodings before passing the string to the C-function, but that does not affect the contents of RSTRING_PTR at all.

What I need is a way to have the string represented as a WCHAR* encoded in UTF-16 (in the case of "ö", that would be 0x00F6) in my C-function, but that's kinda hard to do if you do not know what encoding you're coming from...

thx for any help in advance

Was it helpful?

Solution

String internals in ruby 1.9 depends on __ENCODING__ constant and Encoding.default_internal setting.

In your case it looks like UTF-8 (default), but ö is actually c3 b6 in UTF-8, and c3 a4 is ä

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top