I'm wondering why NSMutableData allocation and freeing is so incredibly slow.
I wanted to test unique_ptr+new[] vs malloc()/free() allocation/freeing performance, found they are just the same and eventually tried to compare with NSMutableData to use as a raw bytes buffer.
Result are seems to be quite strange - I wasn't even able to wait for NSMutableData cycle to finish and even worse, app was consuming a LOT more memory than with raw C/C++ memory allocation.
I know that all this CoreFoundation/ObjectiveC mechanics requires some overhead to work, but this seems to be too much. What am I missing? Thanks.
Here's the test code (compile as ObjectiveC++):
#include <Foundation/Foundation.h>
#include <memory>
#include <random>
#include <chrono>
using namespace std;
using namespace std::chrono;
// dummy function in other compilation unit to fool optimizer:
// void Fake(void *v){}
void Fake(void *v);
int main(int argc, const char * argv[])
{
const size_t sizes_amount = 256;
const size_t runs = /* 16 * */ 1024*1024;
size_t sizes[sizes_amount];
mt19937 mt((random_device())());
uniform_int_distribution<size_t> dist(0, 1024*1024);
for(auto &i: sizes)
i = dist(mt); // allocating from 0 to 1M bytes
// test malloc/free and c pointers
auto t0 = high_resolution_clock::now();
for(int i = 0; i < runs; ++i) {
void *v = malloc(sizes[i % sizes_amount]);
Fake(v);
free(v);
}
// test unique_ptr + new uint8_t[]
auto t1 = high_resolution_clock::now();
for(int i = 0; i < runs; ++i) {
unique_ptr<uint8_t[]> v(new uint8_t[ sizes[i % sizes_amount] ]);
Fake(v.get());
v.reset();
}
// test NSMutableData
auto t2 = high_resolution_clock::now();
for(int i = 0; i < runs; ++i) {
NSMutableData *data = [NSMutableData dataWithLength:sizes[i % sizes_amount]];
Fake(data.mutableBytes);
}
auto t3 = high_resolution_clock::now();
printf("malloc/free + c pointers: %lld\n", duration_cast<milliseconds>(t1 - t0).count());
printf("new/detele + unique_ptr: %lld\n", duration_cast<milliseconds>(t2 - t1).count());
printf("NSMutableData: %lld\n", duration_cast<milliseconds>(t3 - t2).count());
return 0;
}
Update:
For pure CoreFoundation everything seems to be reasonable (something like 50x penalty of speed, which is ok for such synthetic test, and equal memory consumption):
for(int i = 0; i < runs; ++i) {
CFMutableDataRef data = CFDataCreateMutable(0, sizes[i % sizes_amount]);
CFDataSetLength(data, sizes[i % sizes_amount]);
Fake(CFDataGetMutableBytePtr(data));
CFRelease(data);
}
This difference is weird since NSMutableData and CFMutableDataRef are toll-free bridged and (in theory) can use the same internal mechanics.