Comment puis-je optimiser cette imbriquée boucle?

https://stackoverflow.com/questions/3426769

26-09-2019
|

Question

Le programme devrait passer par chaque mot dans le tableau créé à partir du fichier texte de mot, et si elle est supérieure à 8 caractères, ajouter au tableau de goodWords. Mais la mise en garde est que je ne veux que le mot racine pour être dans le tableau GoodWords, par exemple:

Si greet est ajouté au tableau, je ne veux pas Salue ou salutations ou greeters, etc.

    NSString *string = [NSString stringWithContentsOfFile:@"/Users/james/dev/WordParser/word.txt" encoding:NSUTF8StringEncoding error:NULL];
    NSArray *words = [string componentsSeparatedByString:@"\r\n"];
    NSMutableArray *goodWords = [NSMutableArray array];
    BOOL shouldAddToGoodWords = YES;

    for (NSString *word in words)
    {
        NSLog(@"Word: %@", word);

        if ([word length] > 8)
        {
            NSLog(@"Word is greater than 8");

            for (NSString *existingWord in [goodWords reverseObjectEnumerator])
            {
                NSLog(@"Existing Word: %@", existingWord);
                if ([word rangeOfString:existingWord].location != NSNotFound)
                {
                    NSLog(@"Not adding...");
                    shouldAddToGoodWords = NO;
                    break;
                }
            }

            if (shouldAddToGoodWords)
            {
                NSLog(@"Adding word: %@", word);
                [goodWords addObject:word];
            }
        }

        shouldAddToGoodWords = YES;
    }

La solution

Que diriez-vous quelque chose comme ça?

//load the words from wherever
NSString * allWords = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"];
//create a mutable array of the words
NSMutableArray * words = [[allWords componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] mutableCopy];
//remove any words that are shorter than 8 characters
[words filterUsingPredicate:[NSPredicate predicateWithFormat:@"length >= 8"]];
//sort the words in ascending order
[words sortUsingSelector:@selector(caseInsensitiveCompare:)];

//create a set of indexes (these will be the non-root words)
NSMutableIndexSet * badIndexes = [NSMutableIndexSet indexSet];
//remember our current root word
NSString * currentRoot = nil;
NSUInteger count = [words count];
//loop through the words
for (NSUInteger i = 0; i < count; ++i) {
    NSString * word = [words objectAtIndex:i];
    if (currentRoot == nil) {
        //base case
        currentRoot = word;
    } else if ([word hasPrefix:currentRoot]) {
        //word is a non-root word.  remember this index to remove it later
        [badIndexes addIndex:i];
    } else {
        //no match. this word is our new root
        currentRoot = word;
    }
}
//remove the non-root words
[words removeObjectsAtIndexes:badIndexes];
NSLog(@"%@", words);
[words release];

Cela va très très vite sur ma machine (2.8GHz PBM).

Autres conseils

semble Trie adapté à vos besoins. Il est comme un hachage, et est utile pour détecter si une chaîne donnée est un préfixe d'une chaîne déjà vu.

J'ai utilisé un NSSet pour vous assurer que vous avez seulement 1 copie d'un mot ajouté à la fois. Il ajoutera un mot si le NSSet ne contient pas déjà. Il vérifie ensuite si le nouveau mot est une sous-chaîne pour tout mot qui a déjà été ajouté, si cela est vrai, alors il ne sera pas ajouter le nouveau mot. C'est la insensible à la casse ainsi.

Ce que j'ai écrit est un refactoring de votre code. Il est sans doute pas beaucoup plus rapide, mais vous ne voulez vraiment une structure de données d'arbre si vous voulez faire beaucoup plus vite quand vous voulez rechercher des mots qui ont déjà été ajoutés à votre arbre.

Jetez un oeil à RedBlack arbres ou B-arbres .

words.txt

objective
objectively
cappucin
cappucino
cappucine
programme
programmer
programmatic
programmatically

code source

- (void)addRootWords {

    NSString        *textFile = [[NSBundle mainBundle] pathForResource:@"words" ofType:@"txt"];
    NSString        *string = [NSString stringWithContentsOfFile:textFile encoding:NSUTF8StringEncoding error:NULL];
    NSArray         *wordFile = [string componentsSeparatedByString:@"\n"];
    NSMutableSet    *goodWords = [[NSMutableSet alloc] init];

    for (NSString *newWord in wordFile)
    {
        NSLog(@"Word: %@", newWord);
        if ([newWord length] > 8)
        {
            NSLog(@"Word '%@' contains 8 or more characters", newWord);
            BOOL shouldAddWord = NO;
            if ( [goodWords containsObject:newWord] == NO) {
                shouldAddWord = YES;
            }
            for (NSString *existingWord in goodWords)
            {
                NSRange textRange = [[newWord lowercaseString] rangeOfString:[existingWord lowercaseString]];
                if( textRange.location != NSNotFound ) {
                    // newWord contains the a substring of existingWord
                    shouldAddWord = NO;
                    break;
                }
                NSLog(@"(word:%@) does not contain (substring:%@)", newWord, existingWord);
                shouldAddWord = YES;
            }
            if (shouldAddWord) {
                NSLog(@"Adding word: %@", newWord);
                [goodWords addObject:newWord];
            }
        }
    }

    NSLog(@"***Added words***");
    int count = 1;
    for (NSString *word in goodWords) {
        NSLog(@"%d: %@", count, word);
        count++;
    }

    [goodWords release];
}

Sortie:

***Added words***
1: cappucino
2: programme
3: objective
4: programmatic
5: cappucine

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow