TypedArray Set vs. Unrolled Loop (Javascript)

Question 1

Well set doesn't exactly have simple semantics like that, in V8 after doing some figuring out of what should be done it will essentially arrive at exactly the same loop that the other methods are directly doing in the first place.

Note that JavaScript is compiled into highly optimized machine code if you play your cards right (all the tests do that) so there should be no "worshipping" of some methods just because they are "native".

Question 2

This is an old question, but there is a reason to use TypedArrays if you have a specific need to optimize some poorly performing code. The important thing to understand about TypedArray objects in JavaScript is that they are views which represent a range of bytes inside of an ArrayBuffer. The underlying ArrayBuffer actually represents the contiguous block of binary data to operate on, but we need a view in order to access and manipulate a window of that binary data.

Separate (or even overlapping) ranges in the same ArrayBuffer can be viewed by multiple different TypedArray objects. When you have two TypedArray objects that share the same ArrayBuffer, the set operation is extremely fast. This is because the machine is working with a contiguous block of memory.

Here's an example. We'll create an ArrayBuffer of 32 bytes, one length-16 Uint8Array to represent the first 16 bytes of the buffer, and another length-16 Uint8Array to represent the last 16 bytes:

var buffer = new ArrayBuffer(32);
var array1 = new Uint8Array(buffer,  0, 16);
var array2 = new Uint8Array(buffer, 16, 16);

Now we can initialize some values in the first half of the buffer:

for (var i = 0; i < 16; i++) array1[i] = i;
console.log(array1); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
console.log(array2); // [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0,  0,  0,  0,  0,  0]

And then very efficiently copy those 8 bytes into the second half of the buffer:

array2.set(array1);
console.log(array1); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
console.log(array2); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

We can confirm that the two arrays actually share the same buffer by looking at the buffer with another view. For example, we could use a length-8 Uint32Array that spans the entire 32 bytes of the buffer:

var array3 = new Uint32Array(buffer)
console.log(array3); // [50462976, 117835012, 185207048, 252579084, 
                     //  50462976, 117835012, 185207048, 252579084]

I modified a JSPerf test I found to demonstrate the huge performance boost of a copy on the same buffer:

http://jsperf.com/typedarray-set-vs-loop/3

We get an order of magnitude better performance on Chrome and Firefox, and it's even much faster than taking a normal array of double length and copying the first half to the second half. But we have to consider the cycles/memory tradeoff here. As long as we have a reference to any single view of an ArrayBuffer, the rest of the buffer's data can not be garbage collected. An ArrayBuffer.transfer function is proposed for ES7 Harmony which would solve this problem by giving us the ability explicitly release memory without waiting for the garbage collector, as well as the ability to dynamically grow ArrayBuffers without necessarily copying.

Question 3

I've also been exploring how set() performs and I have to say that for smaller blocks (such as the 16 indices used by the original poster), set() is still around 5x slower than the comparable unrolled loop, even when operating on a contiguous block of memory.

I've adapted the original jsperf test here. I think its fair to say that for small block transfers such as this, set() simply can't compete with unrolled index assignment performance. For larger block transfers (as seen in sbking's test), set() does perform better but then it is competing with literally 1 million array index operations so it would seem bonkers to not be able to overcome this with a single instruction.

The contiguous buffer set() in my test does perform slightly better than the separate buffer set(), but again, at this size of transfer the performance benefit is marginal