Question

I have an array containing 170k strings (words in a dictionary), and a string, looking something like "glapplega". I'm trying to extract the word "apple" from the string (with "apple" being a word in the array). I also need to make sure that the extracted word is at least 3 characters. The code I have right now is the following:

NSPredicate *wordPredicate = [NSPredicate predicateWithFormat:@"'%@' contains[cd] SELF", string];
NSPredicate *lengthPredicate = [NSPredicate predicateWithFormat:@"SELF.length > 2"];
NSPredicate *predicate = [NSCompoundPredicate andPredicateWithSubpredicates:@[wordPredicate, lengthPredicate]];
return [_words filteredArrayUsingPredicate:lengthPredicate];

The length predicate works on it's own, but the word predicate does not (it returns an empty array, despite "apple" being a word in the array).

I suspect that there might be a problem with using SELF as the right expression in the predicate, as all the examples I found have it as the left expression, although I have no way of confirming this.

Edit: I'm aware that this can likely be accomplished with regexs (as described here), but was hoping there would be a way around this, as regexs can be slow with such a large dataset.

Was it helpful?

Solution

Solving this problem is easy if you iterate the array yourself using a block predicate. At some point a formatted NSPredicate would have to boil down to this, so there shouldn't be much of a performance hit. -[NSString rangeOfString:] can be used to test for inclusion of the string.

return [_words filteredArrayUsingPredicate:[NSPredicate predicateWithBlock:^BOOL (id evaluatedString, NSDictionary *bindings) {
    return string.length > 2 && [string rangeOfString:evaluatedString].location != NSNotFound;
}]];

OTHER TIPS

You know what your above assumption and predicate is perfectly valid. The only thing that you have been doing wrong is quotations. Reformat your predicate and make it like this,

  NSArray * array = @[@"Apple", @"lega", @"foo", @"bar"];
  NSString *string = @"glapplega";
  NSPredicate *predicate = [NSPredicate predicateWithFormat:@"%@ contains[cd] SELF and SELF.length > 2", string];
  NSLog(@"%@",[array filteredArrayUsingPredicate:predicate]);

(
    Apple,
    lega
)

When you specify the format and supply the string to the format, the predicate places the quotes by itself. So, you have been mistaking over here.

#define rchar (rand() % ('z'-'a') + 'a')

- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
    NSMutableArray * mar = [NSMutableArray new];
    for (int i = 0; i<170000; i++)
    {
        NSString * str = [NSString stringWithFormat:@"%c%c%c%c",rchar, rchar, rchar, rchar];
        [mar addObject:str];
    }
    NSString * bigStr = @"asdfghjkl;loiuytrdcvcdrtgvfrtghvcftyghvfghcfdtyjghvncdfjtygmvcnfhjghjkgfhdgsxgrecrvtbkunhlmnhubkujvytchrtxgrecdjvbyhnkbjgcfhvyjhbghnkbjchgdfvbghnukbytvjycterwxrzewxcevfbjnkmjohgytreytwexkutckhtdtcfhvjgkjmhgcjhewwzsserdp9dlkuydssqwsxdchvggjhmgbj";
    NSDate *start = [NSDate date];
    NSArray * marFiltered = [mar filteredArrayUsingPredicate:[NSPredicate predicateWithBlock:^BOOL(id evaluatedObject, NSDictionary *bindings) {
        return [bigStr rangeOfString:evaluatedObject].length>2;
    }]];
    NSLog(@"found %lu items in %f seconds", (unsigned long)[marFiltered count], -[start timeIntervalSinceNow]);
}

output:

2014-05-11 09:09:53.048 170k[89396:303] found 85 items in 0.542431 seconds

You can try two options for defining the predicate. A format string and a block. Here is a bit of code that demonstrates both. I've played a bit with both and can share that the performance is the same. I've only had the patience to run it with a max value of INT32_MAX/2 (a lot of items).

Here goes. Hope this clarifies and helps:

    NSString* searchString = @"AB0";
    NSUInteger capacity = 1000000;
    NSMutableArray* array  = [NSMutableArray array];

    NSLog(@"Fillling array with %lu UUIDS. Be patient.", (unsigned long)capacity);
    NSUInteger batch = 0;
    for ( NSUInteger i = 0; i < capacity; i++ ) {
        [array setObject:[[NSUUID UUID] UUIDString] atIndexedSubscript:i];
        if (i != 0 && i % (capacity / 10) == 0 ) {
            NSLog(@"Completed %lu%%", (unsigned long)++batch * 10);
        }
    }

    NSLog(@"Done.");

    NSPredicate* formatPredicate = [NSPredicate predicateWithFormat:@"SELF contains[cd] %@ AND SELF.length > 3", searchString];
    NSLog(@"Filtering with predicate: %@", formatPredicate);
    NSArray* formatArray = [array filteredArrayUsingPredicate:formatPredicate];
    NSLog(@"Got %lu results.", formatArray.count);

    NSPredicate* blockPredicate = [NSPredicate predicateWithBlock:^BOOL(id evaluatedObject, NSDictionary *bindings) {
        NSString* theString = evaluatedObject;
        return theString.length > 3 && [theString rangeOfString:searchString].location != NSNotFound;
    }];

    NSLog(@"Filtering with predicate: %@", blockPredicate);
    NSArray* blockArray = [array filteredArrayUsingPredicate:blockPredicate];
    NSLog(@"Got %lu results.", blockArray.count);

PS: I wouldn't run this on a phone if you are using big numbers line INT32_MAX :)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top