Friday, September 22, 2017

In JavaScript, an allocated object escapes if it is accessible from outside the current function. Normally V8 allocates new objects on the JavaScript heap, but using escape analysis, an optimizing compiler can figure out when an object can be treated specially because its lifetime is provably bound to the function’s activation. When the reference to a newly allocated object does not escape the function that creates it, JavaScript engines don’t need to explicitly allocate that object on the heap. They can instead effectively treat the values of the object as local variables to the function. That in turn enables all kinds of optimizations like storing these values on the stack or in registers, or in some cases, optimizing the values away completely. Objects that escape (more accurately, objects that can’t be proven to not escape) must be heap-allocated.

For example, escape analysis enables V8 to effectively rewrite the following code:

V8 v6.1 and older used an escape analysis implementation that was complex and generated many bugs since its introduction. This implementation has since been removed and a brand new escape analysis codebase is available in V8 v6.2.

However, a Chrome security vulnerability involving the old escape analysis implementation in V8 v6.1 has been discovered and responsibly disclosed to us. To protect our users, we’ve turned off escape analysis in Chrome 61. Node.js should not be affected as the exploit depends on execution of untrusted JavaScript.

Turning off escape analysis negatively impacts performance because it disables the abovementioned optimizations. Specifically, the following ES2015 features might suffer temporary slowdowns:

destructuring

for-of iteration

array spread

rest parameters

Note that disabling escape analysis is only a temporary measure. With Chrome 62, we’ll ship the brand new — and most importantly, enabled — implementation of escape analysis as seen in V8 v6.2.

Tuesday, September 12, 2017

JavaScript objects can have arbitrary properties associated with them. The names of object properties can contain any character. One of the interesting cases that a JavaScript engine can choose to optimize for are properties whose names are purely numeric, most specifically array indices.

In V8, properties with integer names — the most common form of which are objects generated by the Array constructor — are handled specially. Although in many circumstances these numerically-indexed properties behave just like other properties, V8 chooses to store them separately from non-numeric properties for optimization purposes. Internally, V8 even gives these properties a special name: elements. Objects have properties that map to values, whereas arrays have indices that map to elements.

Although these internals are never directly exposed to JavaScript developers, they explain why certain code patterns are faster than others.

Common elements kinds

While running JavaScript code, V8 keeps track of what kind of elements each array contains. This information allows V8 to optimize any operations on the array specifically for this type of element. For example, when you call reduce, map, or forEach on an array, V8 can optimize those operations based on what kind of elements the array contains.

Take this array, for example:

const array = [1, 2, 3];

What kinds of elements does it contain? If you’d ask the typeof operator, it would tell you the array contains numbers. At the language-level, that’s all you get: JavaScript doesn’t distinguish between integers, floats, and doubles — they’re all just numbers. However, at the engine level, we can make more precise distinctions. The elements kind for this array is PACKED_SMI_ELEMENTS. In V8, the term Smi refers to the particular format used to store small integers. (We’ll get to the PACKED part in a minute.)

Later adding a floating-point number to the same array transitions it to a more generic elements kind:

We’ve seen three distinct elements kinds so far, with the following basic types:

Small integers, also known as Smi.

Doubles, for floating-point numbers and integers that cannot be represented as a Smi.

Regular elements, for values that cannot be represented as Smi or doubles.

Note that doubles form a more general variant of Smi, and regular elements are another generalization on top of doubles. The set of numbers that can be represented as a Smi is a subset of the numbers that can be represented as a double.

What’s important here is that elements kind transitions only go in one direction: from specific (e.g. PACKED_SMI_ELEMENTS) to more general (e.g. PACKED_ELEMENTS). Once an array is marked as PACKED_ELEMENTS, it cannot go back to PACKED_DOUBLE_ELEMENTS, for example.

So far, we’ve learned the following:

V8 assigns an elements kind to each array.

The elements kind of an array is not set in stone — it can change at runtime. In the earlier example, we transitioned from PACKED_SMI_ELEMENTS to PACKED_ELEMENTS.

Elements kind transitions can only go from specific kinds to more general kinds.

PACKED vs. HOLEY kinds

So far, we’ve only been dealing with dense or packed arrays. Creating holes in the array (i.e. making the array sparse) downgrades the elements kind to its “holey” variant:

V8 makes this distinction because operations on packed arrays can be optimized more aggressively than operations on holey arrays. For packed arrays, most operations can be performed efficiently. In comparison, operations on holey arrays require additional checks and expensive lookups on the prototype chain.

Each of the basic elements kinds we’ve seen so far (i.e. Smis, doubles, and regular elements) comes in two flavors: the packed and the holey version. Not only can we transition from, say, PACKED_SMI_ELEMENTS to PACKED_DOUBLE_ELEMENTS, we can also transition from any PACKED kind to its HOLEY counterpart.

To recap:

The most common elements kinds come in PACKED and HOLEY flavors.

Operations on packed arrays are more efficient than operations on holey arrays.

Elements kinds can transition from PACKED to HOLEY flavors.

The elements kind lattice

V8 implements this tag transitioning system as a lattice. Here’s a simplified visualization of that featuring only the most common elements kinds:

It’s only possible to transition downwards through the lattice. Once a single floating-point number is added to an array of Smis, it is marked as DOUBLE, even if you later overwrite the float with a Smi.
Similarly, once a hole is created in an array, it’s marked as holey forever, even when you fill it later.

In general, more specific elements kinds enable more fine-grained optimizations. The further down the elements kind is in the lattice, the slower manipulations of that object might be. For optimal performance, avoid needlessly transitioning to less specific types — stick to the most specific one that’s applicable to your situation.

Performance tips

In most cases, elements kind tracking works invisibly under the hood and you don’t need to worry about it. But here are a few things you can do to get the greatest possible benefit from the system.

Avoid creating holes

Let’s say we’re trying to create an array, for example:

const array = newArray(3);
// The array is sparse at this point, so it gets marked as// `HOLEY_SMI_ELEMENTS`, i.e. the most specific possibility given// the current information.
array[0] = 'a';
// Hold up, that’s a string instead of a small integer… So the kind// transitions to `HOLEY_ELEMENTS`.
array[1] = 'b';
array[2] = 'c';
// At this point, all three positions in the array are filled, so// the array is packed (i.e. no longer sparse). However, we cannot// transition to a more specific kind such as `PACKED_ELEMENTS`. The// elements kind remains `HOLEY_ELEMENTS`.

Once the array is marked as holey, it’s holey forever — even if it’s packed later! Any operation on the array from then on is potentially slower than it could be. If you plan on performing lots of operations on the array, and you’d like to optimize those operations, avoid creating holes in the array. V8 can deal with packed arrays more efficiently.

A better way of creating an array is to use a literal instead:

const array = ['a', 'b', 'c'];
// elements kind: PACKED_ELEMENTS

If you don’t know all the values ahead of time, create an array, and later push the values to it.

This approach ensures that the array never transitions to a holey elements kind. As a result, V8 can optimize any future operations on the array more efficiently.

Avoid reading beyond the length of the array

A similar situation to hitting a hole occurs when reading beyond the length of the array, e.g. reading array[42] when array.length === 5. In this case, the array index 42 is out of bounds, the property is not present on the array itself, and so the JavaScript engine has to perform the same expensive prototype chain lookups.

When the collection you’re looping over is iterable (as is the case for arrays and NodeLists), that’s even better: just use for-of.

for (const item of items) {
doSomething(item);
}

For arrays specifically, you could use the forEach built-in:

items.forEach((item) => {
doSomething(item);
});

Nowadays, the performance of both for-of and forEach is on par with the old-fashioned for loop.

Avoid reading beyond the array’s length! Doing so is just as bad as hitting a hole in an array. In this case, V8’s bounds check fails, the check to see if the property is present fails, and then we need to look up the prototype chain.

Avoid elements kind transitions

In general, if you need to perform lots of operations on an array, try sticking to an elements kind that’s as specific as possible, so that V8 can optimize those operations as much as possible.

This is harder than it seems. For example, just adding -0 to an array of small integers is enough to transition it to PACKED_DOUBLE_ELEMENTS.

If you’re planning on performing lots of operations on an array of integers, consider normalizing -0 and blocking NaN and Infinity when initializing the values. That way, the array sticks to the PACKED_SMI_ELEMENTS kind. This one-time normalization cost can be worth the later optimizations.

In fact, if you’re doing mathematical operations on an array of numbers, consider using a TypedArray. We have specialized elements kinds for those, too.

Prefer arrays over array-like objects

Some objects in JavaScript — especially in the DOM — look like arrays although they aren’t proper arrays. It’s possible to create array-like objects yourself:

This object has a length and supports indexed element access (just like an array!) but it lacks array methods such as forEach on its prototype. It’s still possible to call array generics on it, though:

This code calls the Array.prototype.forEach built-in on the array-like object, and it works as expected. However, this is slower than calling forEach on a proper array, which is highly optimized in V8. If you plan on using array built-ins on this object more than once, consider turning it into an actual array beforehand:

Avoid polymorphism

If you have code that handles arrays of many different elements kinds, it can lead to polymorphic operations that are slower than a version of the code that only operates on a single elements kind.

Consider the following example, where a library function is called with various elements kinds. (Note that this is not the native Array.prototype.forEach, which has its own set of optimizations on top of the elements kinds-specific optimizations discussed in this article.)

const each = (array, callback) => {
for (let index = 0; index < array.length; ++index) {
const item = array[index];
callback(item);
}
};
const doSomething = (item) =>console.log(item);
each([], () => {});
each(['a', 'b', 'c'], doSomething);
// `each` is called with `PACKED_ELEMENTS`. V8 uses an inline cache// (or “IC”) to remember that `each` is called with this particular// elements kind. V8 is optimistic and assumes that the// `array.length` and `array[index]` accesses inside the `each`// function are monomorphic (i.e. only ever receive a single kind// of elements) until proven otherwise. For every future call to// `each`, V8 checks if the elements kind is `PACKED_ELEMENTS`. If// so, V8 can re-use the previously-generated code. If not, more work// is needed.
each([1.1, 2.2, 3.3], doSomething);
// `each` is called with `PACKED_DOUBLE_ELEMENTS`. Because V8 has// now seen different elements kinds passed to `each` in its IC, the// `array.length` and `array[index]` accesses inside the `each`// function get marked as polymorphic. V8 now needs an additional// check every time `each` gets called: one for `PACKED_ELEMENTS`// (like before), a new one for `PACKED_DOUBLE_ELEMENTS`, and one for// any other elements kinds (like before). This incurs a performance// hit.
each([1, 2, 3], doSomething);
// `each` is called with `PACKED_SMI_ELEMENTS`. This triggers another// degree of polymorphism. There are now three different elements// kinds in the IC for `each`. For every `each` call from now on, yet// another elements kind check is needed to re-use the generated code// for `PACKED_SMI_ELEMENTS`. This comes at a performance cost.

Built-in methods (such as Array.prototype.forEach) can deal with this kind of polymorphism much more efficiently, so consider using them instead of userland library functions in performance-sensitive situations.

Another example of monomorphism vs. polymorphism in V8 involves object shapes, also known as the hidden class of an object. To learn about that case, check out Vyacheslav’s article.

Debugging elements kinds

To figure out a given object’s “elements kind”, get a debug build of d8 (see “Building from source”), and run:

$ out.gn/x64.debug/d8 --allow-natives-syntax

This opens a d8 REPL in which special functions such as %DebugPrint(object) are available. The “elements” field in its output reveals the “elements kind” of any object you pass to it.

Monday, September 11, 2017

Every six weeks, we create a new branch of V8 as part of our release process. Each version is branched from V8’s git master immediately before a Chrome Beta milestone. Today we’re pleased to announce our newest branch, V8 version 6.2, which is in beta until its release in coordination with Chrome 62 Stable in several weeks. V8 v6.2 is filled with all sorts of developer-facing goodies. This post provides a preview of some of the highlights in anticipation of the release.

With the advent of ES2015, Object#toString() became monkey-patchable via the new Symbol.toStringTag symbol, which also made Object#toString() more heavy-weight and more challenging to speed up. In this release we ported an optimization initially implemented in the SpiderMonkey JavaScript engine to V8, speeding up throughput of Object#toString() by a factor of 6.5×.

It also impacts the Speedometer browser benchmark, specifically the AngularJS subtest, where we measured a solid 3% improvement. Read the detailed blog post for additional information.

We’ve also significantly improved the performance of ES2015 proxies, speeding up calling a proxy object via someProxy(params) or new SomeOtherProxy(params) by up to 5×:

And similarly, the performance of accessing a property on a proxy object via someProxy.property improved by almost 6.5×:

This is part of an ongoing internship. Stay tuned for a more detailed blog post and final results.

Hashcode lookups for internal hash tables got much faster, resulting in improved performance for Map, Set, WeakMap, and WeakSet. An upcoming blog post will explain this optimization in detail.

The garbage collector now uses a Parallel Scavenger for collecting the so-called young generation of the heap.

Enhanced low-memory mode

Over the last few releases V8’s low-memory mode was enhanced (e.g. by setting initial semi-space size to 512K). Low-memory devices now hit fewer out-of-memory situations. This low-memory behavior might have a negative impact on runtime performance though.

More regular expressions features

Support for the dotAll mode for regular expressions, enabled through the s flag, is now enabled by default. In dotAll mode, the . atom in regular expressions matches any character, including line terminators.

/foo.bar/su.test('foo\nbar'); // true

Lookbehind assertions, another new regular expression feature, are now available by default. The name already describes its meaning pretty well. Lookbehind assertions offer a way to restrict a pattern to only match if preceded by the pattern in the lookbehind group. It comes in both matching and non-matching flavors:

/(?<=\$)\d+/.exec('$1 is worth about ¥123'); // ['1']/(?<!\$)\d+/.exec('$1 is worth about ¥123'); // ['123']

Wednesday, August 30, 2017

In this blog post we would like to explain how V8 handles JavaScript properties internally. From a JavaScript point of view there are only a few distinctions necessary for properties. JavaScript objects mostly behave like dictionaries, with string keys and arbitrary objects as values. The specification does however treat integer-indexed properties and other properties differently during iteration. Other than that, the different properties behave mostly the same, independent of whether they are integer indexed or not.

However, under the hood V8 does rely on several different representations of properties for performance and memory reasons. In this blog post we are going to explain how V8 can provide fast property access while handling dynamically-added properties. Understanding how properties work is essential for explaining how optimizations such as inline caches work in V8.

This post explains the difference in handling integer-indexed and named properties. After that we show how V8 maintains HiddenClasses when adding named properties in order to provide a fast way to identify the shape of an object. We'll then continue giving insights into how named properties are optimized for fast accesses or fast modification depending on the usage. In the final section we provide details on how V8 handles integer-indexed properties or array indices.

Named Properties vs. Elements

Let's start by analysing a very simple object such as {a: "foo", b: "bar"}. This object has two named properties, "a" and "b". It does not have any integer indices for property names. array-indexed properties, more commonly known as elements, are most prominent on arrays. For instance the array ["foo", "bar"] has two array-indexed properties: 0, with the value "foo", and 1, with the value "bar". This is the first major distinction on how V8 handles properties in general.

The following diagram shows what a basic JavaScript object looks like in memory.

Elements and properties are stored in two separate data structures which makes adding and accessing properties or elements more efficient for different usage patterns.

Elements are mainly used for the various Array.prototype methods such as pop or slice. Given that these functions access properties in consecutive ranges, V8 also represents them as simple arrays internally — most of the time. Later in this post we will explain how we sometimes switch to a sparse dictionary-based representation to save memory.

Named properties are stored in a similar way in a separate array. However, unlike elements, we cannot simply use the key to deduce their position within the properties array; we need some additional metadata. In V8 every JavaScript object has a HiddenClass associated. The HiddenClass stores information about the shape of an object, and among other things, a mapping from property names to indices into the properties. To complicate things we sometimes use a dictionary for the properties instead of a simple array. We will explain this in more detail in a dedicated section.

Takeaway from this section:

Array-indexed properties are stored in a separate elements store.

Named properties are stored in the properties store.

Elements and properties can either be arrays or dictionaries.

Each JavaScript object has a HiddenClass associated that keeps information about the object shape.

HiddenClasses and DescriptorArrays

After explaining the general distinction of elements and named properties we need to have a look at how HiddenClasses work in V8. This HiddenClass stores meta information about an object, including the number of properties on the object and a reference to the object’s prototype. HiddenClasses are conceptually similar to classes in typical object-oriented programming languages. However, in a prototype-based language such as JavaScript it is generally not possible to know classes upfront. Hence, in this case V8, HiddenClasses are created on the fly and updated dynamically as objects change. HiddenClasses serve as an identifier for the shape of an object and as such a very important ingredient for V8's optimizing compiler and inline caches. The optimizing compiler for instance can directly inline property accesses if it can ensure a compatible objects structure through the HiddenClass.

Let's have a look at the important parts of a HiddenClass.

In V8 the first field of a JavaScript object points to a HiddenClass. (In fact, this is the case for any object that is on the V8 heap and managed by the garbage collector.) In terms of properties, the most important information is the third bit field, which stores the number of properties, and a pointer to the descriptor array. The descriptor array contains information about named properties like the name itself and the position where the value is stored. Note that we do not keep track of integer indexed properties here, hence there is no entry in the descriptor array.

The basic assumption about HiddenClasses is that objects with the same structure — e.g. the same named properties in the same order — share the same HiddenClass. To achieve that we use a different HiddenClass when a property gets added to an object. In the following example we start from an empty object and add three named properties.

Every time a new property is added, the object's HiddenClass is changed. In the background V8 creates a transition tree that links the HiddenClasses together. V8 knows which HiddenClass to take when you add, for instance, the property "a" to an empty object. This transition tree makes sure you end up with the same final HiddenClass if you add the same properties in the same order. The following example shows that we would follow the same transition tree even if we add simple indexed properties in between.

However, if we create a new object that gets a different property added, in this case property "d," V8 creates a separate branch for the new HiddenClasses.

Takeway from this section:

Objects with the same structure (same properties in the same order) have the same HiddenClass

By default every new named property added causes a new HiddenClass to be created.

Adding array-indexed properties does not create new HiddenClasses.

The Three Different Kinds of Named Properties

After giving an overview on how V8 uses HiddenClasses to track the shape of objects let’s dive into how these properties are actually stored. As explained in the introduction above, there are two fundamental kind of properties: named and indexed. The following section covers named properties.

A simple object such as {a: 1, b: 2} can have various internal representations in V8. While JavaScript objects behave more or less like simple dictionaries from the outside, V8 tries to avoid dictionaries because they hamper certain optimizations such as inline caches which we will explain in a separate post.

In-object vs. Normal Properties: V8 supports so-called in-object properties which are stored directly on the object themselves. These are the fastest properties available in V8 as they are accessible without any indirection. The number of in-object properties is predetermined by the initial size of the object. If more properties get added than there is space in the object, they are stored in the properties store. The properties store adds one level of indirection but can be grown independently.

Fast vs. Slow Properties: The next important distinction is between fast and slow properties. Typically we define the properties stored in the linear properties store as "fast". Fast properties are simply accessed by index in the properties store. To get from the name of the property to the actual position in the properties store, we have to consult the descriptor array on the HiddenClass, as we've outlined before.

However, if many properties get added and deleted from an object, it can generate a lot of time and memory overhead to maintain the descriptor array and HiddenClasses. Hence, V8 also supports so-called slow properties. An object with slow properties has a self-contained dictionary as a properties store. All the properties meta information is no longer stored in the descriptor array on the HiddenClass but directly in the properties dictionary. Hence, properties can be added and removed without updating the HiddenClass. Since inline caches don’t work with dictionary properties, the latter, are typically slower than fast properties.

Takeaway from this section:

There are three different named property types: in-object, fast and slow/dictionary.

In-object properties are stored directly on the object itself and provide the fastest access.

Fast properties live in the properties store, all the meta information is stored in the descriptor array on the HiddenClass.

Slow properties live in a self-contained properties dictionary, meta information is no longer shared through the HiddenClass.

Slow properties allow for efficient property removal and addition but are slower to access than the other two types.

Elements or array-indexed Properties

So far we have looked at named properties and ignored integer indexed properties commonly used with arrays. Handling of integer indexed properties is no less complex than named properties. Even though all indexed properties are always kept separately in the elements store, there are 20 different types of elements!

Packed or Holey Elements: The first major distinction V8 makes is whether the elements backing store is packed or has holes in it. You get holes in a backing store if you delete an indexed element, or for instance, you don't define it. A simple example is [1,,3] where the second entry is a hole. The following example illustrates this issue:

In short, if a property is not present on the receiver we have to keep on looking on the prototype chain. Given that elements are self-contained, e.g. we don't store information about present indexed properties on the HiddenClass, we need a special value, called the_hole, to mark properties that are not present. This is crucial for the performance of Array functions. If we know that there are no holes, i.e. the elements store is packed, we can perform local operations without expensive lookups on the prototype chain.

Fast or Dictionary Elements: The second major distinction made on elements is whether they are fast or dictionary-mode. Fast elements are simple VM-internal arrays where the property index maps to the index in the elements store. However, this simple representation is rather wasteful for very large sparse/holey arrays where only few entries are occupied. In this case we used a dictionary-based representation to save memory at the cost of slightly slower access:

In this example, allocating a full array with 10k entries would be rather wasteful. What happens instead is that V8 creates a dictionary where we store a key-value-descriptor triplets. The key in this case would be 9999 and the value "foo" and the default descriptor is used. Given that we don't have a way to store descriptor details on the HiddenClass, V8 resorts to slow elements whenever you define an indexed properties with a custom descriptor:

In this example we added a non-configurable property on the array. This information is stored in the descriptor part of a slow elements dictionary triplet. It is important to note that Array functions perform considerably slower on objects with slow elements.

Smi and Double Elements: For fast elements there is another important distinction made in V8. For instance if you only store integers in an Array, a common use-case, the GC does not have to look at the array, as integers are directly encoded as so called small integers (Smis) in place. Another special case are Arrays that only contain doubles. Unlike Smis, floating point numbers are usually represented as full objects occupying several words. However, V8 stores raw doubles for pure double arrays to avoid memory and performance overhead. The following example lists 4 examples of Smi and double elements:

Special Elements: With the information so far we covered 7 out of the 20 different element kinds. For simplicity we excluded 9 element kinds for TypedArrays, two more for String wrappers and last but not least, two more special element kinds for arguments objects.

The ElementsAccessor: As you can imagine we are not exactly keen on writing Array functions 20 times in C++, once for every elements kind. That's where some C++ magic comes into play. Instead of implementing Array functions over and over again, we built the ElementsAccessor where we mostly have to implement only simple functions that access elements from the backing store. The ElementsAccessor relies on CRTP to create specialized versions of each Array function. So if you call something like slice on a array, V8 internally calls a builtin written in C++ and dispatches through the ElementsAccessor to the specialized version of the function:

Takeaway from this section:

There are fast and dictionary-mode indexed properties and elements.

Fast properties can be packed or they can can contain holes which indicate that an indexed property has been deleted.

Elements are specialized on their content to speed up Array functions and reduce GC overhead.

Understanding how properties work is key to many optimizations in V8. For JavaScript developers many of these internal decisions are not visible directly, but they explain why certain code patterns are faster than others. Changing the property or element type typically causes V8 to create a different HiddenClass which can lead to type pollution which prevents V8 from generating optimal code. Stay tuned for further posts on how the VM-internals of V8 work.

Friday, August 11, 2017

Early July this year, Node.js released a security update for all currently maintained branches to address a hash flooding vulnerability. This intermediate fix comes at the cost of a significant startup performance regression. In the meantime, V8 has implemented a solution which avoids the performance penalty.

In this post, we want to give some background and history on the vulnerability and the eventual solution.

Hash flooding attack

Hash tables are one of the most important data structures in computer science. They are widely used in V8, for example to store an object’s properties. On average, inserting a new entry is very efficient at O(1). However, hash collisions could lead to a worst case of O(n). That means that inserting n entries can take up to O(n²).

In Node.js, HTTP headers are represented as JavaScript objects. Pairs of header name and values are stored as object properties. With cleverly prepared HTTP requests, an attacker could perform a denial-of-service attack. A Node.js process would become unresponsive, being busy with worst-case hash table insertions.

This attack has been disclosed as early as December of 2011, and shown to affect a wide range of programming languages. How come it took this long for V8 and Node.js to finally address this issue?

In fact, very soon after the disclosure, V8 engineers worked with the Node.js community on a mitigation. From Node.js v0.11.8 onwards, this issue had been addressed. The fix introduced a so-called hash seed value. The hash seed is randomly chosen at startup and used to seed every hash value in a particular V8 instance. Without the knowledge of the hash seed, an attacker has a hard time to hit the worst-case, let alone come up with an attack that targets all Node.js instances.

This version only solves the issue for those that compile V8 themselves or those that do not use snapshots. A snapshot-based precompiled V8 will still have predictable string hash codes.

Startup snapshot

Startup snapshots are a mechanism in V8 to dramatically speed up both engine startup and creating new contexts (i.e. via the vm module in Node.js). Instead of setting up initial objects and internal data structures from scratch, V8 deserializes from an existing snapshot. An up-to-date build of V8 with snapshot starts up in less than 3ms, and requires a fraction of a millisecond to create a new context. Without the snapshot, startup takes more than 200ms, and a new context more than 10ms. This is a difference of two orders of magnitude.

We covered how any V8 embedder can take advantage of startup snapshots in previousposts.

A pre-built snapshot contains hash tables and other hash-value-based data structures. Once initialized from snapshot, the hash seed can no longer be changed without corrupting these data structures. A Node.js release that bundles the snapshot has a fixed hash seed, making the mitigation ineffective.

That is what the explicit warning in the commit message was about.

Almost fixed, but not quite

Fast-forward to 2015, a Node.js issue reports that creating a new context has regressed in performance. Unsurprisingly, this is because the startup snapshot has been disabled as part of the mitigation. But by that time not everyone participating in the discussion was aware of the reason.

As explained in this post, V8 uses a pseudo-random number generator to generate Math.random results. Every V8 context has its own copy of the random number generate state. This is to prevent Math.random results from being predictable across contexts.

The random number generator state is seeded from an external source right after the context is created. It does not matter whether the context is created from scratch, or deserialized from snapshot.

Somehow, the random number generator state has been confused with the hash seed. As result, a pre-built snapshot started being part of the official release since io.js v2.0.2.

Second attempt

It was not until May 2017, during some internal discussions between V8, Google’s Project Zero, and Google’s Cloud Platform, when we realized that Node.js was still vulnerable to hash flooding attacks.

But we did not want to leave it at that. Disabling startup snapshot has significant performance impacts. Over the years, we have added many new languagefeatures and sophisticatedoptimizations to V8. Some of these additions made starting up from scratch even more expensive. Immediately after the security release, we started working on a long-term solution. The goal is to be able to re-enable startup snapshot without becoming vulnerable to hash flooding.

From proposed solutions, we chose and implemented the most pragmatic one. After deserializing from snapshot, we would choose a new hash seed. Affected data structures are then rehashed to ensure consistency.

As it turns out, in an ordinary startup snapshot few data structures are actually affected. And to our delight, rehashing hash tables have been made easy in V8 in the meantime. The overhead this adds is insignificant.

The patch to re-enable startup snapshot has been mergedinto Node.js. It is part of the recent Node.js v8.3.0 release.