Frage

In order to search a string (haystack) for another substring (needle) and get all the ranges for the substring I have been loading the haystack into an NSData object and then also getting the NSData for the needle string and using rangeOfData:options:range: to search for the needle in the haystack.

// Get the data for the contents of the file, store error
NSError *error;
NSData *fileData = [NSData dataWithContentsOfFile:filePath options:0 error:&error];
// Check for error
if (error) {
    // Handle it...
}
NSData *needleData = [needle dataUsingEncoding:NSUTF8StringEncoding];

NSRange searchRange = NSMakeRange(0, fileData.length);
while (searchRange.location < fileData.length) {
    NSRange needleRange = [fileData rangeOfData:needleData options:0 range:searchRange];
    if (needleRange != NSNotFound) {
        // Found one, use the range...
    } else {
        // Otherwise there are no more to be found, bail out
        break;
    }
}

Generally the needle ranges found using rangeOfData: are the same as the range of the needle string in the haystack string however this assumes that each character is 1 byte however some Unicode characters are not and are 2 (or more) bytes, for example ✔ and ✘. This results in the range of the needle in the data not being the same as the range of the needle in the string.

Is there anyway to accurately get the range of the string from its range in the data or should I be looking at using a different algorithm? I tested a number of methods for searching the string itself and this came up fastest (compared to using rangeOfString:, NSRegularExpression, KMP, Boyer-Moore and Boyer-Moore-Horspool).

War es hilfreich?

Lösung

(From my above comment:) Convert both haystack and needle string to NSData with the NSUTF32BigEndianStringEncoding. Then every character occupies exactly 4 bytes in the data blob.

Andere Tipps

Try strstr(3) with pointer arithmetic. With some help from strchr(3) you will be able to massively parallelizing this.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top