Question

I am having trouble searching for arabic text in mysql. I have in database a row containing a record

display_name
أحمد

But when I try to do a query with

SELECT * FROM wp_users WHERE display_name LIKE '%احمد%'

I tried to add at the end of the query

collate utf8_bin

But it didn't work either. How can I have

احمد == أحمد
Was it helpful?

Solution

I don't have an exact solution, but I can tell you why it's not working. If you want those two strings to be considered equal, you need to use a different collation since utf8_bin compares exact code points, and those two strings are clearly not identical when considered that way. Normally MySQL's utf8_general_ci collation would provide transliteration and normalisation, for example all these match:

SELECT 'a'='A' COLLATE utf8_general_ci;
SELECT 'ü'='u' COLLATE utf8_general_ci;
SELECT 'ß'='ss' COLLATE utf8_general_ci;

but in your case it doesn't work, and nor does the more accurate utf8_unicode_ci collation:

SELECT 'احمد'='أحمد' COLLATE utf8_general_ci;
SELECT 'احمد'='أحمد' COLLATE utf8_unicode_ci;

This chart shows the character mapping for middle eastern languages in MySQL's utf8_unicode_ci collation, and you can see that the أ and ا characters are not considered equal, so MySQL's default collations won't solve this problem.

To work around this you have two options: normalise your strings before they hit MySQL (i.e. in PHP), or extend MySQL to provide an appropriate collation to do what you need.

The Ar-PHP project can help with the former, as sємsєм suggested. You should store your real user name and the normalised one separately so that you can search on one and display the other. Another project also provides a way of rephrasing arabic strings to work better in MySQL.

MySQL docs show how to create a custom collation. It essentially involves editing an LDML XML file (there is at least a BBEdit plugin to help with this) and giving it to MySQL. This will allow you to create a mapping that lets you treat some characters as equivalents. The advantage of this approach is that it's transparent to PHP and you don't need any additional columns in your database. If you build such a mapping, it would be beneficial to other Arabic users across multiple programming languages, not just PHP.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top