Question

i have a database that contains non-english words ( for those who wonders turkish letters). And i have an algorithm which compares the input with database.

So my problem is this; in my database all the strings are written with turkish characters. So lets say i have thıs element to compare heyyö. When user enters heyyo it won't find it since they are considered as different words.

My first thought was put special cases and when a non-english character found consider whether english or non-english letter ( like g with ğ or i with ı) but that means a lot of brute force.

how can i do this with elegance.

Oh and user enters this inputs from a textfield if that wasn't implied.

Was it helpful?

Solution

The removal of diacritics is called "folding." You can compare strings without regard to diacritics using the option NSDiacriticInsensitiveSearch.

[string compare:otherString options:NSDiacriticInsensitiveSearch] == NSOrderedSame

You can similarly generate a folded string using stringByFoldingWithOptions:locale:.

Note that this only removes diacritics. There are many ways that characters can "seem" the same without being the same. Turkish is somewhat notorious about this because the lowercase version of "I" is "ı" (LATIN SMALL DOTLESS I), not "i". If you're particularly dealing with Turkish, you may have to account for this.

OTHER TIPS

What you can do is something like this:

NSString *input = @"heyyö";
NSData *intermediaryDataForm = [input dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *output = [[NSString alloc] initWithData:intermediaryDataForm encoding:NSASCIIStringEncoding];

That way, because the turkish letters are not part of ASCII, and you are allowing a lossy conversion, then it automatically changes 'ö' to 'o' when converted to the NSData form. Then converting it back to NSString solves the issue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top