Question

I have Arabic sentence/word which I'd like to insert into my Tkinter text widget. However when I do insert the text I see the following result:

enter image description here

Here are the strings I am trying to insert: 'تاريخه' ,'تارِيخ' First one was inserted correctly, the second one was extracted by findall() and garbled upon insertion.

Basically all my code (for the bold text on screenshot) is quite straighforward:

word = re.findall(u'word=.*', TEXT, re.UNICODE)[0] # searching for Arabic word and taking [0]
header = " ".join([QUERY, word]) # creating a varible to insert
text.insert('1.0', "".join([header,'\n'])) # inserting Arabic text

It looks like re.findall() function finds all occurrences of 'word=.*' regexp in the TEXT and retrieves word variable in unicode notation.

I'm puzzled here. Can I somehow convert word prior to insertion into the text widget?

Était-ce utile?

La solution

As you answered in the comment, the TEXT is already escaped. Change the function that generate the TEXT to correctly return a string.

If you can't control the function that generate the text, unescape the text using str.decode with unicode_escape encoding.

>>> TEXT = u'word=\\u0631\\u064e\\u062c\\u0627'
>>> print TEXT
word=\u0631\u064e\u062c\u0627
>>> TEXT = TEXT.decode('unicode-escape')
>>> print TEXT
word=رَجا

Example

# coding: utf-8

from Tkinter import *

root = Tk()
text = Text(root)
text.pack()

QUERY = u'\u0627\u0631\u062c\u0648'
TEXT = u'word=\\u0631\\u064e\\u062c\\u0627'  # escaped!!
TEXT = TEXT.decode('unicode-escape')
word = re.findall(u'word=.*', TEXT, re.UNICODE)[0]
header = " ".join([QUERY, word])
text.insert('1.0', "".join([header,'\n']))

root.mainloop()

enter image description here

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top