¿Cómo puedo optimizar cabo esta anidado bucle for?

https://stackoverflow.com/questions/3426769

26-09-2019
|

Pregunta

El programa debe ir a través de cada palabra de la matriz creada a partir del archivo de texto la palabra, y si es mayor de 8 caracteres, añadirlo a la matriz goodWords. Pero el inconveniente es que sólo quiero la palabra raíz para estar en la matriz GoodWords, por ejemplo:

Si se añade saludar a la matriz, no quiero saluda o saludos o recibidores, etc.

    NSString *string = [NSString stringWithContentsOfFile:@"/Users/james/dev/WordParser/word.txt" encoding:NSUTF8StringEncoding error:NULL];
    NSArray *words = [string componentsSeparatedByString:@"\r\n"];
    NSMutableArray *goodWords = [NSMutableArray array];
    BOOL shouldAddToGoodWords = YES;

    for (NSString *word in words)
    {
        NSLog(@"Word: %@", word);

        if ([word length] > 8)
        {
            NSLog(@"Word is greater than 8");

            for (NSString *existingWord in [goodWords reverseObjectEnumerator])
            {
                NSLog(@"Existing Word: %@", existingWord);
                if ([word rangeOfString:existingWord].location != NSNotFound)
                {
                    NSLog(@"Not adding...");
                    shouldAddToGoodWords = NO;
                    break;
                }
            }

            if (shouldAddToGoodWords)
            {
                NSLog(@"Adding word: %@", word);
                [goodWords addObject:word];
            }
        }

        shouldAddToGoodWords = YES;
    }

Solución

¿Qué tal algo como esto?

//load the words from wherever
NSString * allWords = [NSString stringWithContentsOfFile:@"/usr/share/dict/words"];
//create a mutable array of the words
NSMutableArray * words = [[allWords componentsSeparatedByCharactersInSet:[NSCharacterSet newlineCharacterSet]] mutableCopy];
//remove any words that are shorter than 8 characters
[words filterUsingPredicate:[NSPredicate predicateWithFormat:@"length >= 8"]];
//sort the words in ascending order
[words sortUsingSelector:@selector(caseInsensitiveCompare:)];

//create a set of indexes (these will be the non-root words)
NSMutableIndexSet * badIndexes = [NSMutableIndexSet indexSet];
//remember our current root word
NSString * currentRoot = nil;
NSUInteger count = [words count];
//loop through the words
for (NSUInteger i = 0; i < count; ++i) {
    NSString * word = [words objectAtIndex:i];
    if (currentRoot == nil) {
        //base case
        currentRoot = word;
    } else if ([word hasPrefix:currentRoot]) {
        //word is a non-root word.  remember this index to remove it later
        [badIndexes addIndex:i];
    } else {
        //no match. this word is our new root
        currentRoot = word;
    }
}
//remove the non-root words
[words removeObjectsAtIndexes:badIndexes];
NSLog(@"%@", words);
[words release];

Esto va muy muy rápidamente en mi máquina (MBP 2,8 GHz).

Otros consejos

Trie parece adecuado para su propósito. Es como un hash, y es útil para detectar si una cadena dada es un prefijo de una cadena ya se ha visto.

He utilizado un NSSet para asegurar que sólo tiene 1 copia de una palabra añadida a la vez. Se añade una palabra si el NSSet no contiene ya ella. A continuación, comprueba para ver si la nueva palabra es una subcadena de cualquier palabra que ya ha sido añadido, de ser cierto, entonces no será añadir la nueva palabra. Es sensible a las mayúsculas también.

Lo que he escrito es una refactorización del código. No es probable que mucho más rápido, pero que realmente quiere hacer una estructura de datos de árbol si quieres que sea mucho más rápido cuando se quiere buscar palabras que ya se han añadido a su árbol.

Tome un vistazo a RedBlack árboles o árboles B .

words.txt

objective
objectively
cappucin
cappucino
cappucine
programme
programmer
programmatic
programmatically

Código fuente

- (void)addRootWords {

    NSString        *textFile = [[NSBundle mainBundle] pathForResource:@"words" ofType:@"txt"];
    NSString        *string = [NSString stringWithContentsOfFile:textFile encoding:NSUTF8StringEncoding error:NULL];
    NSArray         *wordFile = [string componentsSeparatedByString:@"\n"];
    NSMutableSet    *goodWords = [[NSMutableSet alloc] init];

    for (NSString *newWord in wordFile)
    {
        NSLog(@"Word: %@", newWord);
        if ([newWord length] > 8)
        {
            NSLog(@"Word '%@' contains 8 or more characters", newWord);
            BOOL shouldAddWord = NO;
            if ( [goodWords containsObject:newWord] == NO) {
                shouldAddWord = YES;
            }
            for (NSString *existingWord in goodWords)
            {
                NSRange textRange = [[newWord lowercaseString] rangeOfString:[existingWord lowercaseString]];
                if( textRange.location != NSNotFound ) {
                    // newWord contains the a substring of existingWord
                    shouldAddWord = NO;
                    break;
                }
                NSLog(@"(word:%@) does not contain (substring:%@)", newWord, existingWord);
                shouldAddWord = YES;
            }
            if (shouldAddWord) {
                NSLog(@"Adding word: %@", newWord);
                [goodWords addObject:newWord];
            }
        }
    }

    NSLog(@"***Added words***");
    int count = 1;
    for (NSString *word in goodWords) {
        NSLog(@"%d: %@", count, word);
        count++;
    }

    [goodWords release];
}

Salida:

***Added words***
1: cappucino
2: programme
3: objective
4: programmatic
5: cappucine

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow