The interpreter engine for the core JavaScript language, independent of the browser's object model. File ONLY core JavaScript language bugs in this category. For bugs involving browser objects such as "window" and "document", use the "DOM" component. For bugs involving calls between JavaScript and C++, use the "XPConnect" component.

Last week I was investigating an Elm benchmark and it uses Array#shift in a way that results in quadratic behavior for us (it uses arrays as queues). Both V8 and JSC optimize this be much faster.
We could optimize this as follows:
(1) When we shift() an element, increment the elements_ pointer to point to the second element (and move the ObjectElements header as well).
(2) In the ObjectElements header, increment an integer (numShiftedElements or something). This will allow us to recover the original pointer which we have to pass to free/realloc etc.
(3) To avoid wasting memory when we shift many elements, we would reallocate and discard this space when we grow the elements (when we have to realloc anyway) or we could do this during GC when numShiftedElements is sufficiently large.
(4) unshift could be optimized in a similar way when numShiftedElements > 0.
This shouldn't be too difficult to implement I think and it would be nice to have it fixed soonish, as it's easy to get quadratic behavior from this.
I'm attaching a micro-benchmark that takes 1 ms in Chrome/Safari and 35 ms for us.

I added some logging and I think this would help a lot of websites. Although shift() is typically used with small arrays (<= 3 elements), when I open outlook.com for instance it looks like they have an array with > 1900 elements and keep shift()ing one element. This will easily take more than a few ms on an average machine.
Considering this is causing quadratic behavior on real websites, we should probably get it fixed soon. I'll work on a patch this week, hopefully it won't be that hard.

This is a great idea! Both shift() and pop() can treat the elements array as a dequeue. A "shrink capacity to 1/2 when usage is <1/3" would give us O(1) amortized performance for shift() and pop(), while avoid edge cases where we keep shrinking/growing for alternating add/remove ops around the shrink/grow boundary.

This implements what I described in comment 0. It seems to work very well: we can now shift the elements header and when we grow/shrink elements we unshift these shifted elements if needed.
Most of the changes have to do with GC stuff: barriers, elements alloc/realloc/free, etc. Other than that it works pretty transparently and I didn't have to touch a lot of code. I changed Ion's optimized shift code to move the elements before (instead of after) updating the length + initializedLength, this makes it more similar to ArrayShiftDenseKernel.
The attached micro-benchmark improves from 35 ms to 0-1 ms. On Jon's GC barrier benchmark in bug 1362956 I get this:
Before:
Length: Iterations: GC state: Contents: Shift time: Pop time:
100000 1000 no GC objects 0.10338 0.0010601
100000 1000 no GC numbers 0.084298 0.00094800
100000 1000 in GC objects 1.8337 0.0011079
100000 1000 in GC numbers 0.43972 0.0010459
After:
Length: Iterations: GC state: Contents: Shift time: Pop time:
100000 1000 no GC objects 0.00022510 0.00090601
100000 1000 no GC numbers 0.00010181 0.00073486
100000 1000 in GC objects 0.00013989 0.0010242
100000 1000 in GC numbers 0.00010498 0.00071899
Note that shift() is now as fast as pop() and we no longer have the bad perf cliff when we're in the middle of an incremental GC.
We can also optimize unshift() now when there are shifted elements, but I'll do that as a follow-up.

With the previous patch we could shift up to 64k elements but this patch reduces that to 2047. It doesn't matter much perf wise but it may improve memory usage in certain pathological cases, and in general I think it's good to be conservative with these things.
I noticed Chrome has a bad perf cliff somewhere around ~50,000:
Chrome shifting 50k elements: 4 ms
Chrome shifting 51k elements: > 1 second
Chrome shifting 80k elements: > 2.5 seconds
Nightly without patch shifting 80k elements: ~2.5 seconds
Nightly with patch shifting 80k elements: 4 ms
So Chrome still has the quadratic behavior when shifting more than 50k elements. Safari takes about 5 ms and is similar to us.

Thanks for the quick review.
(In reply to Jon Coppeard (:jonco) from comment #8)
> It might be simpler to store flags and shifted elements count as separate
> uint16_ts. Setting a maximum for the shifted element count is a good idea
> though.
These flags are accessed from JIT code and there's some concern that accessing 16-bit values is slower than 32-bit values. We have some workarounds for that for JSFunction flags, see this comment:
http://searchfox.org/mozilla-central/rev/d66b9f27d5630a90b2fce4d70d4e9050f43df9b4/js/src/jit/MacroAssembler-inl.h#446> Can this overflow?
No all these values (and the result of the addition too) are <= MAX_DENSE_ELEMENTS_ALLOCATION.
> The name doesn't make it clear what this does. Maybe it could be called
> unshiftedIndex or indexIgnoringShift or something?
Good point, I renamed it to unshiftedIndex and removed the comment (because the function is no longer barrier-specific).