Objective-C implementation of Ruby "chunk"

Question 1

If you just need the number of times the objects occur, then Kurt's answer is pretty good. If you actually need the chunking, though, this should work:

NSArray *original = @[@1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6];
NSMutableArray *chunked = [NSMutableArray array];

NSNumber *current = nil;
for (NSNumber *number in [original sortedArrayUsingSelector:@selector(compare:)]) {
    if (![number isEqual:current]) {
        [chunked addObject:[NSMutableArray arrayWithObject:number]];
        current = number;
    } else {
        [[chunked lastObject] addObject:number];
    }
}

NSLog(@"%@", chunked);

Unless I've missed something, this isn't computationally complex, and should be slightly more efficient than Tim's original method (no need for dictionaries, sets, or hashing). There's one sort involved (in fast enumeration, the container — the part after in — is evaluated only once), and you iterate over the sorted array once. NSMutableArray insertion is O(1) at either end, so worst case should be O(n) because of the iteration.

Actually: on further review, the following code runs a lot faster for large sets of numbers. It's slightly more convoluted, but runs more efficiently.

NSArray *original = @[@1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6];
NSMutableArray *chunked = [NSMutableArray array];

NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:original];
for (NSNumber *number in countedSet) {
    NSMutableArray *chunk = [NSMutableArray array];
    NSUInteger count = [set countForObject:number];
    for (NSUInteger i = 0; i < count; i++) {
        [chunk addObject:number];
    }

    [chunked addObject:chunk];
}

[chunked sortUsingComparator:^(NSArray *a1, NSArray *a2) {
    return [a1[0] compare:a2[0]];
}];

NSLog(@"%@", chunked);

With 10000000 random numbers, the first implementation runs in about 12.27 seconds, while the second runs in 0.92 seconds. Go figure.

The second method has a drawback in that the chunks it creates are all duplicates of the same object; if that presents problems for you (in the general case, it could be problematic for memory management, or if your objects can be considered 'equal' in a sense, even if all their properties aren't exactly so), then use the first method. Otherwise, this'll work better for you.

Additional clarification: on further thought, I knew something was fishy in the time differences between the two methods, and I was right. If you have a lot of variation in your dataset (with very few repeated numbers), method 2 will run far, far more slowly; variation in numbers doesn't affect method 1 much. For many repeated numbers, method 2 will be pretty quick, but if your dataset is completely random, you'd be better off using method 1.

Here's the code I'm using to test these two: http://pastebin.com/9syEyiyM

Question 2

Why not use a single NSCountedSet to store all the keys and the count of each one?

NSArray *sourceArray = @[ @1, @3, @5, @7, @9, @8, @5, @3, @2, @4, @3, @6 ];
NSCountedSet *countedSet = [[NSCountedSet alloc] initWithArray:sourceArray];

NSArray* sortedKeys = [[countedSet allObjects] sortedArrayUsingSelector:@selector(compare:)];
for (NSNumber *key in sortedKeys) {
    NSUInteger count = [countedSet countForObject:key];
    NSLog(@"Key: %@ count: %ld", key, (unsigned long)count);
}