Question

In attempting to build a WebGL 3D library for myself (learning purposes mostly) I followed documentation that I found from various sources that stated that the TypedArray function set() (specifically for Float32Array), is supposed to be "as fast as" memcpy in C (obviously tongue in cheek), literally the fastest according to html5rocks. On appearances that seemed to be correct (no loop setup in javascript, disappearing into some uberfast typed array pure C nonsense, etc).

I took a gander at glMatrix (good job on it btw!), and noticed that he (author) stated that he unrolled all of the loops for speed. This is obviously something a javascript guru would do normally for as much speed as possible, but, based on my previous reading, I thought I had 1-up on this library, specifically, he created his lib to be functional with both arrays and typed arrays, thus I thought that I would get more speed by using set() since I was only interested in staying in TypedArray types.

To test my theory I set up this jsperf. Not only does set() comparatively lack speed, every other technique I tried (in the jsperf), beats it. It is the slowest by far.

Finally, my question: Why? I can theoretically understand a loop unrolling becoming highly optimized in spidermonkey or chrome V8 js-engines, but losing out to a for loop seems ridiculous (copy2 in jsperf), especially if its intent is theoretically to speed up copies due to the raw contiguous in memory data types (TypedArray). Either way it feels like the set() function is broken.

Is it my code? my browser? (I am using Firefox 24) or am I missing some other theory of optimization? Any help in understanding this contrary result to my thoughts and understandings would be incredibly helpful.

Was it helpful?

Solution 2

Well set doesn't exactly have simple semantics like that, in V8 after doing some figuring out of what should be done it will essentially arrive at exactly the same loop that the other methods are directly doing in the first place.

Note that JavaScript is compiled into highly optimized machine code if you play your cards right (all the tests do that) so there should be no "worshipping" of some methods just because they are "native".

OTHER TIPS

This is an old question, but there is a reason to use TypedArrays if you have a specific need to optimize some poorly performing code. The important thing to understand about TypedArray objects in JavaScript is that they are views which represent a range of bytes inside of an ArrayBuffer. The underlying ArrayBuffer actually represents the contiguous block of binary data to operate on, but we need a view in order to access and manipulate a window of that binary data.

Separate (or even overlapping) ranges in the same ArrayBuffer can be viewed by multiple different TypedArray objects. When you have two TypedArray objects that share the same ArrayBuffer, the set operation is extremely fast. This is because the machine is working with a contiguous block of memory.

Here's an example. We'll create an ArrayBuffer of 32 bytes, one length-16 Uint8Array to represent the first 16 bytes of the buffer, and another length-16 Uint8Array to represent the last 16 bytes:

var buffer = new ArrayBuffer(32);
var array1 = new Uint8Array(buffer,  0, 16);
var array2 = new Uint8Array(buffer, 16, 16);

Now we can initialize some values in the first half of the buffer:

for (var i = 0; i < 16; i++) array1[i] = i;
console.log(array1); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
console.log(array2); // [0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0,  0,  0,  0,  0,  0]

And then very efficiently copy those 8 bytes into the second half of the buffer:

array2.set(array1);
console.log(array1); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
console.log(array2); // [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

We can confirm that the two arrays actually share the same buffer by looking at the buffer with another view. For example, we could use a length-8 Uint32Array that spans the entire 32 bytes of the buffer:

var array3 = new Uint32Array(buffer)
console.log(array3); // [50462976, 117835012, 185207048, 252579084, 
                     //  50462976, 117835012, 185207048, 252579084]

I modified a JSPerf test I found to demonstrate the huge performance boost of a copy on the same buffer:

http://jsperf.com/typedarray-set-vs-loop/3

We get an order of magnitude better performance on Chrome and Firefox, and it's even much faster than taking a normal array of double length and copying the first half to the second half. But we have to consider the cycles/memory tradeoff here. As long as we have a reference to any single view of an ArrayBuffer, the rest of the buffer's data can not be garbage collected. An ArrayBuffer.transfer function is proposed for ES7 Harmony which would solve this problem by giving us the ability explicitly release memory without waiting for the garbage collector, as well as the ability to dynamically grow ArrayBuffers without necessarily copying.

I've also been exploring how set() performs and I have to say that for smaller blocks (such as the 16 indices used by the original poster), set() is still around 5x slower than the comparable unrolled loop, even when operating on a contiguous block of memory.

I've adapted the original jsperf test here. I think its fair to say that for small block transfers such as this, set() simply can't compete with unrolled index assignment performance. For larger block transfers (as seen in sbking's test), set() does perform better but then it is competing with literally 1 million array index operations so it would seem bonkers to not be able to overcome this with a single instruction.

The contiguous buffer set() in my test does perform slightly better than the separate buffer set(), but again, at this size of transfer the performance benefit is marginal

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top