Question

I got strange results using NSDataDetector and I am looking for insight in how it works.

Is it matching against an internal database or is it using any separation algorithm to detect the separate fields in string?

Currently, I am using the following code to detect the fields of an address:

NSDataDetector *address = [NSDataDetector dataDetectorWithTypes:NSTextCheckingTypeAddress error:nil];
NSArray* matcheslinkaa = [address  matchesInString:inputString options:0 range:NSMakeRange(0, [inputString length])];
if ([matcheslinkaa count]>0) 
{
    for (NSTextCheckingResult *match in matcheslinkaa) 
    {
        if ([match resultType] == NSTextCheckingTypeAddress) 
        {
            NSDictionary *phoneNumber = [match addressComponents];
            NSLog(@"addressComponents  %@",phoneNumber);
        }
    }
}

Following is a sample set of input strings and their respective outputs, using the above code:

inputString = @"100 Main Street\n"  
               "Anytown, NY 12345\n"
               "USA";
// prints:
// addressComponents  {
//     City = Anytown;
//     Country = USA;
//     State = NY;
//     Street = "100 Main Street";
//     ZIP = 12345;
// }

inputString = @"A-205 Natasha Golf View\n"
               "2 Inner Ring Road\n"
               "Bangalore\n"
               "560071\n"
               "Karnataka";
// prints:
// addressComponents  {
//     City = Bangalore;
//     Street = "2 Inner Ring Road";
//     ZIP = 560071;
// }

inputString = @"A-205 Natasha Golf View\n"
               "2 Inner Ring Road\n"
               "Domlur\n"
               "Bangalore\n"
               "560071\n"
               "India";

// prints:
// addressComponents  {
//     City = Bangalore;
//     Street = "2 Inner Ring Road";
//     ZIP = 560071;
// }

inputString = @"Dak Bhavan\n"
               "Parliament Street\n"
               "NEW DELHI 110001\n"
               "INDIA";

// => `addressComponents` is empty!

As you can see, NSDataDetector has no problem to extract US-addresses. Why is it faring so much worse with Indian addresses that it doesn't even find the country name?

Was it helpful?

Solution

Disclaimer

I cannot tell you how it works — the fact, that NSDataDetector inherits NSRegularExpression may suggest that it uses a set of regular expressions, but I honestly doubt that (e.g. the detector for date-types uses information that is sprinkled throughout longer blocks of text, so that it appears more likely that there is some natural language clustering and processing going on under the hood).

The main reason why it works better with American addresses, I suppose, is as simple as it is boring:

Apple is a US-based company and (with the exception of Jonathan Ive, who is British) every of its top-level executives is a North-American. Therefore, it's of little surprise that their approach is "US/North-American First" [1].

It's the reason why the design of the power-brick is so elegant when using the compact US connector (where the prongs fold in) — and looks so clumsy with almost any other...

The other reason is that Apple — like anyone else — ships as soon as they can:
If they have something working for their US customers but not for the rest, why not ship it for them and add support for other locales via software updates later?

With regards to your problem, what may or may not help (read: "I didn't bother testing") with the detection of addresses is that the user set the locale of their device appropriately.

If — and only if — you find out that this has a positive impact on your results, you could then check whether the country part of [[NSLocale currentLocale] localeIdentifier] equals IN and (in case it doesn't) prompt the user to change that in the "Settings" app, otherwise.

If that's not proving to be useful, you've got to Roll-Your-Own™...


(1) The major notable exception to this rule was the choice of the base-band technology for the original iPhone, where favoring GSM over CDMA may have been a disadvantage locally but the key to success globally.

OTHER TIPS

Can you try this one.

[detector enumerateMatchesInString:str
                         options:0 
                           range:NSMakeRange(0, [str length]) 
                      usingBlock:^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {

            
                NSDictionary *phoneNumber = [result addressComponents];
                NSLog(@"addressComponents  %@",phoneNumber);
            

                      }];

If that does not work for you... The Address should be in a format that is

100 Main Street
Anytown, NY 12345
USA

you can try other alternatives.. may be by converting your "str" into the above format...

or directly you can try this..

 NSArray *array = [tempAddrStr componentsSeparatedByString:@","];
        if([array count]>2)
        {
            NSString *str1 = [array objectAtIndex:[array count]-3];
            NSString *str2 = [array objectAtIndex:[array count]-2];
            str1=[str1 stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
            str1=[str1 stringByReplacingOccurrencesOfString:@"\n" withString:@" "];
            str2=[str2 stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
            str2=[str2 stringByReplacingOccurrencesOfString:@"\n" withString:@" "];

            tempAddrStr=[NSString stringWithFormat:@"%@, %@",str1,str2];
        }
        else if([tempAddrStr length]>=140&&[array count]>1)
        {
            NSString *str1 = [array objectAtIndex:[array count]-2];
            NSString *str2 = [array objectAtIndex:[array count]-1];
            str1=[str1 stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
            str1=[str1 stringByReplacingOccurrencesOfString:@"\n" withString:@" "];
            str2=[str2 stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
            str2=[str2 stringByReplacingOccurrencesOfString:@"\n" withString:@" "];

            tempAddrStr=[NSString stringWithFormat:@"%@, %@",str1,str2];
        }

This is a part of code from my project to just get the state and city from an given address returned by CLGeocoder.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top