Question

I have an Objective-C application where I'm trying to sort an NSArray while grouping the array elements that have equal sort values. Ideally, I would generate a new array of sets, where each set in the new array contains one or more of the original array elements and all of the elements in each set have equal sort values. It would work similarly to the Ruby "chunk" method

To give an example, imagine I had an NSArray containing items whose sort values are equivalent to the following:

[1, 3, 5, 7, 9, 8, 5, 3, 2, 4, 3, 6]

I would want the new array to contain 9 sets with sort values that look like:

[ (1), (2), (3, 3, 3), (4), (5, 5), (6), (7), (8), (9) ]

In Ruby I would be able to first sort the array and then chunk it to get what I want. I'm trying to come up with a reasonably efficient way to do it in Objective-C.

I could set up a dictionary containing each possible sort value as a key with an NSSet as the value for each key. I could then loop through the initial array, computing the sort value for each item, finding the appropriate key for that sort value, and updating its set as I go. I could finally sort the contents of that dictionary to get a list of sorted sets.

I could do all that, but it seems like there should be a better way that I'm missing. Also, the values I'm sorting by could actually be floating-point values, so using them as keys in a dictionary is likely to be of limited value.

Can anyone think of a more clever way of doing this? Am I missing something obvious here?

Was it helpful?

Solution

If you just need the number of times the objects occur, then Kurt's answer is pretty good. If you actually need the chunking, though, this should work:

NSArray *original = @[@1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6];
NSMutableArray *chunked = [NSMutableArray array];

NSNumber *current = nil;
for (NSNumber *number in [original sortedArrayUsingSelector:@selector(compare:)]) {
    if (![number isEqual:current]) {
        [chunked addObject:[NSMutableArray arrayWithObject:number]];
        current = number;
    } else {
        [[chunked lastObject] addObject:number];
    }
}

NSLog(@"%@", chunked);

Unless I've missed something, this isn't computationally complex, and should be slightly more efficient than Tim's original method (no need for dictionaries, sets, or hashing). There's one sort involved (in fast enumeration, the container — the part after in — is evaluated only once), and you iterate over the sorted array once. NSMutableArray insertion is O(1) at either end, so worst case should be O(n) because of the iteration.


Actually: on further review, the following code runs a lot faster for large sets of numbers. It's slightly more convoluted, but runs more efficiently.

NSArray *original = @[@1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6];
NSMutableArray *chunked = [NSMutableArray array];

NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:original];
for (NSNumber *number in countedSet) {
    NSMutableArray *chunk = [NSMutableArray array];
    NSUInteger count = [set countForObject:number];
    for (NSUInteger i = 0; i < count; i++) {
        [chunk addObject:number];
    }

    [chunked addObject:chunk];
}

[chunked sortUsingComparator:^(NSArray *a1, NSArray *a2) {
    return [a1[0] compare:a2[0]];
}];

NSLog(@"%@", chunked);

With 10000000 random numbers, the first implementation runs in about 12.27 seconds, while the second runs in 0.92 seconds. Go figure.

The second method has a drawback in that the chunks it creates are all duplicates of the same object; if that presents problems for you (in the general case, it could be problematic for memory management, or if your objects can be considered 'equal' in a sense, even if all their properties aren't exactly so), then use the first method. Otherwise, this'll work better for you.


Additional clarification: on further thought, I knew something was fishy in the time differences between the two methods, and I was right. If you have a lot of variation in your dataset (with very few repeated numbers), method 2 will run far, far more slowly; variation in numbers doesn't affect method 1 much. For many repeated numbers, method 2 will be pretty quick, but if your dataset is completely random, you'd be better off using method 1.

Here's the code I'm using to test these two: http://pastebin.com/9syEyiyM

OTHER TIPS

Why not use a single NSCountedSet to store all the keys and the count of each one?

NSArray *sourceArray = @[ @1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6 ];
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:sourceArray];

NSArray* sortedKeys = [[countedSet allObjects] sortedArrayUsingSelector:@selector(compare:)];
for (NSNumber *key in sortedKeys) {
    NSUInteger count = [countedSet countForObject:key];
    NSLog(@"Key: %@ count: %ld", key, (unsigned long)count);
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top