Question

I'm wondering why NSMutableData allocation and freeing is so incredibly slow.

I wanted to test unique_ptr+new[] vs malloc()/free() allocation/freeing performance, found they are just the same and eventually tried to compare with NSMutableData to use as a raw bytes buffer.

Result are seems to be quite strange - I wasn't even able to wait for NSMutableData cycle to finish and even worse, app was consuming a LOT more memory than with raw C/C++ memory allocation. I know that all this CoreFoundation/ObjectiveC mechanics requires some overhead to work, but this seems to be too much. What am I missing? Thanks.

Here's the test code (compile as ObjectiveC++):

#include <Foundation/Foundation.h>
#include <memory>
#include <random>
#include <chrono>

using namespace std;
using namespace std::chrono;

// dummy function in other compilation unit to fool optimizer:
// void Fake(void *v){}
void Fake(void *v);

int main(int argc, const char * argv[])
{
    const size_t sizes_amount = 256;
    const size_t runs = /* 16 * */ 1024*1024;
    size_t sizes[sizes_amount];

    mt19937 mt((random_device())());
    uniform_int_distribution<size_t> dist(0, 1024*1024);
    for(auto &i: sizes)
        i = dist(mt); // allocating from 0 to 1M bytes

    // test malloc/free and c pointers
    auto t0 = high_resolution_clock::now();
    for(int i = 0; i < runs; ++i) {
        void *v = malloc(sizes[i % sizes_amount]);
        Fake(v);
        free(v);
    }

    // test unique_ptr + new uint8_t[]
    auto t1 = high_resolution_clock::now();
    for(int i = 0; i < runs; ++i) {
        unique_ptr<uint8_t[]> v(new uint8_t[ sizes[i % sizes_amount] ]);
        Fake(v.get());
        v.reset();
    }

    // test NSMutableData
    auto t2 = high_resolution_clock::now();
    for(int i = 0; i < runs; ++i) {
        NSMutableData *data = [NSMutableData dataWithLength:sizes[i % sizes_amount]];
        Fake(data.mutableBytes);
    }

    auto t3 = high_resolution_clock::now();

    printf("malloc/free + c pointers: %lld\n", duration_cast<milliseconds>(t1 - t0).count());
    printf("new/detele + unique_ptr: %lld\n", duration_cast<milliseconds>(t2 - t1).count());
    printf("NSMutableData: %lld\n", duration_cast<milliseconds>(t3 - t2).count());
    return 0;
}

Update: For pure CoreFoundation everything seems to be reasonable (something like 50x penalty of speed, which is ok for such synthetic test, and equal memory consumption):

for(int i = 0; i < runs; ++i) {
    CFMutableDataRef data = CFDataCreateMutable(0, sizes[i % sizes_amount]);
    CFDataSetLength(data, sizes[i % sizes_amount]);
    Fake(CFDataGetMutableBytePtr(data));
    CFRelease(data);
}

This difference is weird since NSMutableData and CFMutableDataRef are toll-free bridged and (in theory) can use the same internal mechanics.

Was it helpful?

Solution

NSMutableData objects are autoreleased which means they are never freed, because there is no pool in place (or it is never popped). This is why it takes so much memory, which is why it is so slow. CoreFoundation functions, on the other hand, do not employ autorelease mechanism, that is why they behave differently here, even though the same code is used internally in both NSMutableData and CFMutableData.

You need to wrap the loop's body into @autoreleasepool {} (or as @Chuck pointed out, do manual alloc/init/release).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top