Igalia Compilers Team

Est. 2011

Legacy RegExp features in JavaScript

In June 2025, I joined the Igalia Coding Experience program. My role was to implement the TC39 proposal Legacy RegExp Features in SpiderMonkey, the JavaScript engine in Mozilla Firefox. This wasn't my first proposal implementation. I'd already implemented the Error.isError and Iterator.range TC39 proposals in SpiderMonkey, but implementing the Legacy RegExp Features proposal involved delving deeper into the Mozilla codebase, and new challenges for me.

To begin with, I created an implementation plan with a timeline of how I was going to approach the proposal. Additionally, I added links to the codebase where I thought I was going to make changes as per the specification, which helped me have a clear starting point and path for integrating the feature. It also meant I could get feedback from SpiderMonkey developers before actually beginning the implementation.

The Legacy RegExp features proposal disables legacy static properties and RegExp.prototype.compile for instances of proper subclasses of RegExp as well as for cross-realm regexps.

The following operations are modified in SpiderMonkey:

RegExp.prototype.compile(pattern, flags) #

This method reinitializes an existing RegExp object with a new pattern and/or flags. It modifies the RegExp object in place rather than creating a new one.

Modification: The proposal modifies RegExp.prototype.compile to throw errors for objects that are not direct instances of the RegExp as well as for cross-realm mismatches. The compile() method initializes a RegExp object similar to the way a RegExp literal is created, bypassing any preprocessing of the pattern that might be done by a RegExp subclass's constructor, and potentially breaking a subclass's custom "exec" method. Thus, compile is disallowed for subclasses. It is now forbidden for a RegExp compile method to be applied to a RegExp object belonging to a different realm, as this would typically result in static properties of the incorrect realm being updated.

Example of newly restricted behaviour:

(base) $ ./mach run
0:00.29 /Users/default/firefox/obj-aarch64-apple-darwin25.2.0/dist/bin/js
js> let g = newGlobal();
js> let re = g.RegExp("x");
js> RegExp.prototype.compile.call(re);
typein:3:26 TypeError: RegExp operation not permitted on object from different realm
Stack:
@typein:3:26
js>

To explain each line of the JavaScript code in detail:

Initially, I added my changes in regexp_compile_impl(), but when testing with ./mach try auto, the feature failed test262 cross-realm tests when run with the ion eager and --more-compartments flag. Debug output showed that when invoking the RegExp.prototype.compile(re) both the receiver or (this`) of the RegExp.prototype.compile() method, and the RegExp object were in the same realm while they weren’t. In other words, the cross-realm check was passing, when it should have been failing, according to the test expectations.

By the time execution reached regexp_compile(), the CallNonGenericMethod<IsRegExpObject, regexp_compile_impl> wrapper had already processed the "receiver" or "this" of the compile method. According to the CallNonGenericMethod documentation, if args.thisv() is not of the correct type, it will attempt to unwrap this and if successful, call the implementation function on the unwrapped this. For a bit of context on this, SpiderMonkey has a concept of Wrapper objects, which decorate an object in a sort of proxy membrane to provide security boundary enforcement. For instance, ensuring that a method can be invoked or a field can be written to from the presently entered compartment. Unwrapping an object means removing that proxy membrane, to access the actual object, similar to how you’d unwrap a gift. This can be done using js::CheckedUnwrapStatic().

With --more-compartments, CallNonGenericMethod in regexp_compile() was automatically unwrapping cross-compartment proxies through CallMethodIfWrapped before calling regexp_compile_impl().

This unwrapping process also switched the JSContext to the target object's realm. This meant that by the time my realm checks executed in regexp_compile_impl(), both cx->realm() and the RegExp object's realm pointed to the same realm (the object's home realm), making them appear equal even in genuine cross-realm call scenarios where the original call came from a different realm.

So I moved the same-realm testing and [[LegacyFeaturesEnabled]] bit testing to regexp_compile(), just before CallNonGenericMethod is called and added js::CheckedUnwrapStatic() to unwrap any proxy wrappers before checking the realm. This ensures we’re checking the realm of the actual RegExp object and not the compartment wrappers around it.

Subclass Instances #

As mentioned above, the RegExp method RegExp.prototype.compile() re-initializes a RegExp using a newly created matcher for the specified pattern and flags. The proposal adds some restrictions to this which prevent oddities such as subclasses not functioning as expected (for instance, by not preprocessing the pattern and adding context used by their exec() implementation). More importantly, when applied to a cross-realm object, this would result in execution modifying the static RegExp members for the incorrect realm.

The proposal modifies the behavior so that legacy static properties are only updated when direct instances of the built-in RegExp constructor are used, not subclass instances or cross-realm objects, using similar logic to RegExp.prototype.compile():

  1. If SameValue(thisRealm, rRealm) is true, then
    • i. If the value of R’s [[LegacyFeaturesEnabled]] internal slot is true, then
      • a. Perform UpdateLegacyRegExpStaticProperties(%RegExp%, S, lastIndex, e, capturedValues).
    • ii. Else,
      • a. Perform InvalidateLegacyRegExpStaticProperties(%RegExp%).

The properties are specced and implemented as accessors with a getter and no setter, except for RegExp.input (and its alias RegExp.$_), which remains writable. Inside each of the accessors, if the receiver this and the %RegExp% realm intrinsic (the standard RegExp constructor) are not the same, we throw a TypeError.

(base) $ ./mach run
0:00.28 /Users/default/firefox/obj-aarch64-apple-darwin25.2.0/dist/bin/js
js> /a(b)c/.exec("abc");
["abc", "b"]
js> RegExp.$1
"b"
js> new RegExp("a(b)").exec("ab");
["ab", "b"]
js> RegExp.$1
"b"
js> new (class extends RegExp {})("a(b)").exec("ab");
["ab", "b"]
js> RegExp.$1
typein:6:1 TypeError: RegExp static property 'static_paren1_getter' is invalid
Stack:
@typein:6:1
js>
/a(b)c/.exec("abc"); RegExp.$1  // should return "b"
new RegExp("a(b)").exec("ab"); RegExp.$1 // "b"
new (class extends RegExp {})("a(b)").exec("ab"); RegExp.$1 // throws

Normalisation of RegExp Static Properties #

RegExp static properties are now defined as configurable and non-enumerable. This is so that the associated features may be easily removed by using the JavaScript delete operator. This is important for consistency with modern ECMA262 and for allows for applications to further reduce the number of side-affect producing globals, including VM native methods.

In SpiderMonkey, the legacy static properties are defined in RegExp.cpp. To implement the proposal, I enclosed the properties with a NIGHTLY_BUILD directive, removing the JS_PROP_PERMANEN and JS_PROP_ENUMERATE flags to make them configurable and non-enumerable for the Nightly environment, where they can be tested by the community. Outside of Nightly, we continue supporting the old implementation for beta/release environments.

Then, I updated the test262 AnnexB RegExp tests to support the change and to limit the tests to Nightly.

Understanding the Implementation: Challenges and Solutions #

1. Creative Bit Packing #

Once the legacy RegExp statics were normalised, the next step was adding a LegacyFeaturesEnabled internal slot. This slot keeps a reference to its constructor and is checked whenever legacy features are accessed. If the RegExp is a subclass instance or is is associated with a different realm, the slot indicates that legacy features should throw an error.

Initially, I added the slot to the RegExpObject :

static const unsigned LEGACY_FEATURES_ENABLED_SLOT = 3; 

This presented a couple of problems for me:

I decided to leave the implementation as is and wait for SpiderMonkey engineers / reviewers to give me feedback and their preference on how to add the Boolean.

During code review, my reviewer Iain pointed out that since we’re only storing a single bit of information (whether legacy features are enabled or not), and the existing FLAGS_SLOT only uses 8 bits, I could store the legacy features in the unused higher bits.

The slot implementation includes a getter, bool legacyFeaturesEnabled(), that reads the bit from the FLAGS_SLOT; and a setter, setLegacyFeaturesEnabled(bool), that writes the bit to the FLAGS_SLOT.

The new approach involved defining some constants based on the size of RegExp Flags so that the code keeps working if RegExpFlags gets bigger in future:

static const size_t RegExpFlagsMask = JS::RegExpFlag::AllFlags;
static const size_t LegacyFeaturesEnabledBit = Bit(8);

static_assert((RegExpFlagsMask & LegacyFeaturesEnabledBit) == 0,
"LegacyFeaturesEnabledBit must not overlap");

RegExpFlagsMask has a bit set to 1 if that bit is part of the RegExpFlags, and 0 otherwise. The lowest 8 bits are currently set to other RegExp flags, which leaves us with the highest bits to pack our slot in.

We perform two operations: raw & RegExpFlagsMask, which gets only the traditional RegExp flags; and raw & ~RegExpFlagsMask, which gets everything apart from the RegExp flags.Those are bits 0-7. We use bit 8 to store LegacyFeaturesEnabled. When we read the flags, we mask off any bits that are not part of the RegExpFlags.

return JS::RegExpFlags(raw & RegExpFlagsMask);

When we write to the flags, we combine the new value of the RegExpFlags bits (flags.value()) with the old value of the other bits in (raw & RegExpFlagsMask).

uint32_t newValue = flags.value() | (raw & ~RegExpFlagsMask);
setFixedSlot(FLAGS_SLOT, Int32Value(newValue));

When we read the LegacyFeaturesEnabledBit, we check if it’s set. When we write it, we take the existing raw value and either set or clear the LegacyFeaturesEnabledBit.

2. Lazy Evaluation #

The proposal specifies RegExp properties as internal slots of the RegExp Object, and the abstract operations UpdateLegacyRegExpStaticProperties (C, S, startIndex, endIndex, capturedValues) and InvalidateLegacyRegExpStaticProperties(C) were initially confusing. The confusion came from a specification detail: we need to eagerly update the properties at a specific point in time, as opposed to SpiderMonkey’s lazily evaluated implementation.

It was the first time I had come across lazy evaluation and thought, naively, that it would be possible to change the implementation to eagerly update static properties after a successful match. This didn't work for a few reasons.

First, lazy evaluation is heavily embedded in the JIT, so the idea of just changing that was… ambitious. Second, lazy evaluation is a way to defer regexp evaluation until RegExp properties are accessed. Third, there’s no observable difference to the end user whether the RegExp properties were lazily or eagerly evaluated. Lastly, internal slots are a way for ECMA262 to describe the internal state of the object.

So, UpdateLegacyRegExpStaticProperties (C, S, startIndex, endIndex, capturedValues) wasn’t needed, as it codifies already existing behaviour in SpiderMonkey. For InvalidateLegacyRegExpStaticProperties(C), my mentor suggested implementing it as a boolean flag in RegExpStatics.

When a subclass or cross-realm regexp executes, this flag is set to true, preventing legacy static properties from being accessed. The flag is cleared after normal RegExp executions, allowing legacy features to work for standard RegExp instances.

Because InvalidateLegacyRegExpStaticProperties(C) marks the values of the static properties as unavailable by setting the internal slots to empty, in step 4 of the accessors GetLegacyRegExpStaticProperty(C, thisValue, internalSlotName), we throw a TypeError if the static properties are invalidated.

Then, we add the equivalent code in the JIT path and so that when a regexp is executed, we lazily store enough information to be able to rerun the regexp later if the RegExpStatics are accessed.

3. Gating the implementation behind a preference #

The first step to implementing a TC39 proposal in SpiderMonkey is adding a preference for it. This allows the feature to be enabled or disabled at runtime, which is important in gating the feature until it has been tested enough for release.

With this proposal, it was awkward, because this was not a new syntax or library method, but behavioral modifications to the existing RegExp static properties and the compile() method.

At first, I enclosed my changes in an #ifdef NIGHTLY_BUILD directive so that they are only available in the nightly environment. But given the potential for web compatibility risks, we needed to put the changes behind a preference. That way, we can flip the feature back in case we break something.

This created an awkward situation: the static RegExp properties themselves (like RegExp.$1, RegExp.input) are defined in regexp_static_props, which is baked into the static RegExp JSClass and embedded in the binary at compile time. I ended up wrapping these property definitions in an #ifdef NIGHTLY_BUILD, meaning they only exist in Nightly builds.

But the behavior of these properties — that is, whether accessing them should throw errors for subclasses and cross-realm regexps — is gated behind a runtime preference. This is even more awkward, because it will change behaviour in Nightly even without the preference enabled.

Thus, the preference only controls whether the new throwing behavior is active. As Iain noted, there wasn't a particularly clean way to avoid this. We'd need two parallel RegExp classes and then have to switch between them at runtime based on the pref, which seemed like overkill.

The compromise was to ship the properties in Nightly, use the preference to control the new behavior, and rely on extra-careful testing.

4. Wild Goose Chase #

Around August, when I had the initial implementation working without memory optimization or centralized legacy and realm checks, I was updating legacy regexp statics in RegExpBuiltinExec() only when matches succeeded.

RegExpBuiltinExec() has two execution paths: a forTest path for RegExp.prototype.test (where we can skip allocating a result object) and a normal path for full execution. I had legacy feature validation in both paths, but only for successful matches.

My mentor suggested we needed to update the legacy regexp statics not just on success, but also on failure. That made sense from a spec perspective, so I spent the next week and a half trying to figure out how to implement this. I was looking into the execution paths, trying to understand where and how to trigger updates on failed matches.

After about a week, we realized that they had misread the proposal! Oops. Turns out, SpiderMonkey doesn't update legacy regexp properties on failure at all: it just returns the last successful result. I'd been chasing a solution to a problem that didn't actually exist in the implementation.

Next Steps and Final Thoughts #

The "Legacy RegExp features in JavaScript" proposal is, at the time of this writing, in stage 3 of the TC39 process, meaning the proposal is stable and no further changes can be made to it. There are potential backward compatibility risks and any attempt to use a disabled feature will throw a Type Error. More on that can be found in the Breaking Hazards portion of the proposal.

Before implementing this proposal I had briefly interacted with C++ on a production level codebase when working on the Error.isError proposal, but working on legacy RegExp properties was a deeper dive into C++ and browser internals, which was difficult but also very much appreciated!

Working on this proposal exposed gaps in my knowledge but also gave me confidence in navigating large C++ codebases. I’m particularly grateful to my mentor, and Daniel Minor and Iain Ireland (from the SpiderMonkey team) for pointing me in the right direction and brainstorming solutions with me.

You may also like: #

A New RegExp Engine in SpiderMonkey

Implementing Iterator.range in SpiderMonkey