RPython ord() with non-ascii character

https://stackoverflow.com/questions/23271542

09-07-2023
|

Question

I'm making a virtual machine in RPython using PyPy. My problem is, that I am converting each character into the numerical representation. For example, converting the letter "a" provides this result, 97. And then I convert the 97 to hex, so I get: 0x61.

So for example, I'm trying to convert the letter "á" into the hexadecimal representation which should be: 0xe1 but instead I get 0xc3 0xa1

Is there a specific encoding I need to use? Currently I'm using UTF-8.

--UPDATE--

Where instr is "á", (including the quotes)

for char in instr:
    char = str(int(ord(char)))
    char = hex(int(char))
    char = char[2:]
    print char # Prints 22 C3 A1 22, 22 is each of the quotes
    # The desired output is 22 E1 22

Solution

#!/usr/bin/env python
# -*- coding: latin-1 -*-

char = 'á'

print str(int(ord(char)))
print hex(int(char))
print char.decode('latin-1')

Gives me:

225
0xe1
0xe1

OTHER TIPS

You are using version 2 of Python language therefore your string: "á" is a byte string, and its contents depend on the encoding of your source file. If the encoding is UTF-8, they are C3 A1 - the string contains two bytes.

If you want to convert it to Unicode codepoints (aka characters), or UTF-16 codepoints (depending on your Python installation), convert it to unicode first, for example using .decode('utf-8').

# -*- encoding: utf-8 -*-

def stuff(instr):
  for char in instr:
    char = str(int(ord(char)))
    char = hex(int(char))
    # I'd replace those two lines above with char = hex(ord(char))
    char = char[2:]
    print char 

stuff("á")
print("-------")
stuff(u"á")

Outputs:

c3
a1
-------
e1

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow