SQL-like expression can't find half space character (Zero-Width-Non-Joiner (ZWNJ))

https://stackoverflow.com/questions/7957241

17-02-2021
|

Question

In the code below I want to select token where this token contain half space character.

  Select *  from
     (select token = 'aaa‏‏sss') as dd
  where token like '%‏‏%'

favorable response:

    aaa‏‏sss

output response:

   null

notation: This character is a Persian character and not showed with a viewable mark. But this character separate string on two sides of itself. for example: token بهترین with half space is بهت‌رین

Solution

I think that the problem is about the collation.

For example this query :

select PATINDEX('%‏‏%','aaa‏‏sss' collate  Arabic_CI_AS)

Return 1, but this one:

select PATINDEX('%‏‏%','aaa‏‏sss' collate  SQL_Latin1_General_CP1_CI_AS)

return 4; and this is the correct position of half-space character in input string.

screenshot

Therefor you must change the Collation of your input string to some Latin Collation like: SQL_Latin1_General_CP1_CI_AS

OTHER TIPS

To know the specific value to use, would require knowing what is the data type of the underlying column is it VARCHAR or NVARCHAR ?
Furthermore, with the former, you 'll further need to know the codepage in use for this database.

In general, however you'll have to use an escape sequence something like

'abc\x008Adef' where \x008A is would be the appropriate code for the half-space in this the underlying coding system.
This value would likely be something between 0x0080 and 0x00FF in a codepage setting, and maybe something like 0x2000 in Unicode.
In fact if you use unicode strings, you will need to omit the 'x' in the escape sequence and just use something like
N'abc\2000def' (again, assuming that hex 2000 is effectively the kind of half-space you have in mind.)

Another possible way-out may comes from the fact that the collation in use on the underlying database handles these half-spaces as plain spaces, and hence you could just use the regular space character in the query. (a bit bit like folks using some collations based on the 1252 codepage where the accentuated characters are considered equivalent to the non accentuated forms.

I find the solution of this problem. If we put character 'N' before pattern string the response will be corrected. The character N means that the string after it, is Unicode. Corrected code :

  Select *  from
     (select token = 'aaa‏‏sss') as dd
  where token like N'%‏‏%'

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow