How to insert arabic script into Tkinter text widget correctly?

https://stackoverflow.com/questions/20995072

25-09-2022
|

Question

I have Arabic sentence/word which I'd like to insert into my Tkinter text widget. However when I do insert the text I see the following result:

enter image description here

Here are the strings I am trying to insert: 'تاريخه' ,'تارِيخ' First one was inserted correctly, the second one was extracted by findall() and garbled upon insertion.

Basically all my code (for the bold text on screenshot) is quite straighforward:

word = re.findall(u'word=.*', TEXT, re.UNICODE)[0] # searching for Arabic word and taking [0]
header = " ".join([QUERY, word]) # creating a varible to insert
text.insert('1.0', "".join([header,'\n'])) # inserting Arabic text

It looks like re.findall() function finds all occurrences of 'word=.*' regexp in the TEXT and retrieves word variable in unicode notation.

I'm puzzled here. Can I somehow convert word prior to insertion into the text widget?

Solution

As you answered in the comment, the TEXT is already escaped. Change the function that generate the TEXT to correctly return a string.

If you can't control the function that generate the text, unescape the text using str.decode with unicode_escape encoding.

>>> TEXT = u'word=\\u0631\\u064e\\u062c\\u0627'
>>> print TEXT
word=\u0631\u064e\u062c\u0627
>>> TEXT = TEXT.decode('unicode-escape')
>>> print TEXT
word=رَجا

Example

# coding: utf-8

from Tkinter import *

root = Tk()
text = Text(root)
text.pack()

QUERY = u'\u0627\u0631\u062c\u0648'
TEXT = u'word=\\u0631\\u064e\\u062c\\u0627'  # escaped!!
TEXT = TEXT.decode('unicode-escape')
word = re.findall(u'word=.*', TEXT, re.UNICODE)[0]
header = " ".join([QUERY, word])
text.insert('1.0', "".join([header,'\n']))

root.mainloop()

enter image description here

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow