Question

What is a good way to compare two individual characters (either char or UTF-16 wchar_ts) ignoring case?

A trivial implementation would be upper- or lowercasing both. Is one of these considered better, or are there other methods?

I understand that a completely correct comparison is not possible with all details of Unicode. The comparison is meant mostly for some basic parsing of config files and micro grammars, so perfection isn't required. I am looking for a not-too-wrong implementation under the restriction of per-character comparison.

[edit]
These configuration files may contain text displayed to the user. Also, when analyzing user input, I can't avoid unicode text.

Was it helpful?

Solution

You need CompareStringEx. It takes wide chars and has a case-insensitive flag.

OTHER TIPS

First convert them to strings, for example make an array of two TCHARs, copy your TCHAR to the first one and set the second one to _T('\0'). Then call lstrcmpi or CompareString. Both of these might be insufficient depending on your needs, but they're a good start. For example if you want to upcase ß, or if the user is using Turkish and you want to upcase i, doing it yourself is harder than you thought.

Don't use Unicode for config files if you want ASCII based case-insensitive comparison. Use ASCII for those files. Then you don't have to worry about locales.

If you are going to restrict yourself to english (ASCII) keywords, then there is a trivial way to do the comparison. This doesn't generalize if you want to use letters other than A-Z in your keywords, but it works beautifully for A-Z.

If you gurantee that one of the values you pass to this function will be a known good keyword string containing only visible characters in the ASCII range 32-127 (A-Z, a-z, 0-9, most symbols) Then you can do simple bitmasking to convert lower to upper case.

bool IsKeywordMatch(LPCTSTR psz, LPCTSTR pszKey)
{
    while (pszKey[0])
    {
       if (psz[0] < 0x20)
          return false;

       if ((psz[0] & ~0x20) != (pszKey[0] & ~0x20))
         return false;

       ++psz;
       ++pszKey;
    }
    return true;
}

This code is NOT a general purpose string compare, it is specialized to compare a known good keyword to an input string. It will treat {} as the uppercase of [], ` as uppercase @, ~ as uppercase ^, but if one of the inputs to this function is guaranteed to contain none of these characters, then it won't matter.

It is meant to be used like this

if (IsKeywordMatch(pszInput, "value"))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top