Posts Tagged ‘daily’

Records and objects: both the same and different

Saturday, January 29th, 2022

I wrote in my most recent “daily” post about the choices involved in representing records in MIR and LIR, Warp’s intermediate languages. Since then, more questions occurred to me.

MIR has a simple type system and some internal typechecking, which is helpful. LIR erases most type details but does have an even simpler type system that distinguishes JavaScript Values from raw integers/pointers/etc. I already added a Record type to MIR; an alternative would have been to give the NewRecord operation the result type Object, reflecting that records are (currently) represented as objects, but I thought that more internal type checks are always a good thing. Without a separate Record type, it’s theoretically possible that compiler bugs could result in record fields being mutable (of course, there are still ways such bugs could arise even with MIR typechecking, but it’s one more thing that makes it less likely).

I realized there’s a problem with this approach: if I want to take option 2 in the previous post — translating FinishRecord into MIR by breaking it down into smaller operations like Elements, LoadFixedSlot, etc. — it’s not possible to generate type-correct MIR. These operations expect an Object as an operand, but I want them to work on Records.

The point of representing records as objects was (at least in part) to reuse existing code for objects, but that goal exists in tension with distinguishing records from objects and keeping the MIR type system simple. Of course, we could add some form of subtyping to MIR, but that’s probably a bad idea.

So orthogonally to the three options I mentioned in my previous post, there’s another set of three options:

  1. Add Record as an MIR type, like I started out doing; don’t re-use MIR instructions like Elements for records, instead making those operations explicit in CodeGenerator, the final translation from LIR to assembly.
  2. Don’t add any new MIR types; give NewRecord the result type Object and FinishRecord the argument and result types Object
  3. Compile away records when translating JS to MIR, adding whatever new MIR instructions are necessary to make this possible; in this approach, records would desugar to Objects and any operations needed to enforce record invariants would be made explicit,

Option 1 rules out some code re-use and loses some internal checking, since more of the heavy lifting happens in the untyped world. Option 2 allows more code re-use, but also loses some internal checking, since it would be unsound to allow every Object operation to work on records. Option 3 has the advantage that only the JS-to-MIR translation (WarpBuilder / WarpCacheIRTranspiler) would need to be modified, but seems riskier than the first two options with respect to type-correctness bugs — it seems like debugging could potentially be hard if records were collapsed into objects, too.

A colleague pointed out that MIR already has some types that are similar to other types, but where certain additional invariants need to be upheld, such as RefOrNull (used in implementing WebAssembly); this type is effectively equivalent to “pointer”, but keeping it a separate type allows for the representation to change later if the WebAssembly implementors want to. Record seems very similar; we want to avoid committing to the Object representation more than necessary.

My colleague also suggested adding an explicit RecordToObject conversion, which would allow all the Object operations to be re-used for records without breaking MIR typechecking. As long as there’s no need to add an ObjectToRecord conversion (which I don’t think there is), this seems like the most appealing choice, combined with option 1 above. There’s still potential for bugs, e.g. if a compiler bug generates code that applies RecordToObject to a record and then does arbitrary object operations on it, such as mutating fields. Ideally, RecordToObject would produce something that can only be consumed by reads, not writes, but this isn’t practical to do within MIR’s existing type system.

(All of the same considerations come up for tuples as well; I just started with records, for no particular reason.)

I don’t think there’s any way to know which approach is best without having a working prototype, so I’m just going to try implementing this and seeing what happens.

Implementing records in Warp

Thursday, January 20th, 2022

Toward the goal of implementing records and tuples in Warp, I’m starting with code generation for empty records. The two modules I’ve been staring at the most are WarpBuilder.cpp and WarpCacheIRTranspiler.cpp.

While the baseline compiler translates bytecode and CacheIR directly into assembly code (using the MacroAssembler), the optimizing compiler (Warp) uses two intermediate languages: it translates bytecode and CacheIR to MIR; MIR to LIR; and then LIR to assembly code.

As explained in more detail here, WarpBuilder takes a snapshot (generated by another module, WarpOracle.cpp) of running code, and for each bytecode, it generates MIR instructions, either from CacheIR (for bytecode ops that can have inline caches), or directly. For ops that can be cached, WarpBuilder calls its own buildIC(), method, which in turn calls the TranspileCacheIRToMIR() method in WarpCacheIRTranspiler.

A comment in WarpBuilderShared.h says “Because this code is used by WarpCacheIRTranspiler we should generally assume that we only have access to the current basic block.” From that, I’m inferring that WarpCacheIRTranspiler maps each CacheIR op onto exactly one basic block. In addition, the addEffectful() method in WarpCacheIRTranspiler enforces that each basic block contains at most one effectful instruction.

In the baseline JIT implementation that I already finished, the InitRecord and FinishRecord bytecodes each have their own corresponding CacheIR ops; I made this choice by looking at how existing ops like NewArray were implemented, though in all of these cases, I’m still not sure I fully understand what the benefit of caching is (rather than just generating code) — my understanding of inline caching is that it’s an optimization to avoid method lookups when polymorphic code is instantiated repeatedly at the same type, and in all of these cases, there’s no type-based polymorphism.

I could go ahead and add InitRecord and FinishRecord into MIR and LIR as well; this would be similar to my existing code where the BaselineCacheIRCompiler compiles these operations to assembly. To implement these operations in Warp, I would add similar code to CodeGenerator.cpp (the module that compiles LIR to assembly) as what is currently in the BaselineCacheIRCompiler.

But, MIR includes some lower-level operations that aren’t present in CacheIR — most relevantly to me, operations for manipulating ObjectElements fields: Elements, SetInitializedLength, and so on. Using these operations (and adding a few more similar ones), I could translate FinishRecord to a series of simpler MIR operations, rather than adding it to MIR. To be more concrete, it would look something like:

(CacheIR)

FinishRecord r

== WarpCacheIRTranspiler ==>

(MIR)

e = Elements r
Freeze e
sortedKeys = LoadFixedSlot r SORTED_KEYS_SLOT
sortedKeysElements = Elements sortedKeys
CallShrinkCapacityToInitializedLength sortedKeys
SetNonWritableArrayLength sortedKeysElements
recordInitializedLength = InitializedLength r
SetArrayLength sortedKeysElements recordInitializedLength
CallSort sortedKeys

(I’m making up a concrete syntax for MIR.)

This would encapsulate the operations involved in finishing a record, primarily sorting the keys array and setting flags to ensure that the record and its sorted keys array are read-only. Several of these are already present in MIR, and the others would be easy to add, following existing operations as a template.

The problem with this approach is that FinishRecord in CacheIR would map onto multiple effectful MIR instructions, so I can’t just add a case for it in WarpCacheIRTranspiler.

I could also push the lower-level operations up into CacheIR, but I don’t know if that’s a good idea, since presumably there’s a reason why it hasn’t been done already.

To summarize, the options I’m considering are:

  1. Pass down InitRecord and FinishRecord through the pipeline by adding them to MIR and LIR
  2. Open up FinishRecord (InitRecord isn’t as complicated) in the translation to MIR, which might involve making FinishRecord non-cacheable altogether
  3. Open up FinishRecord in the translation to CacheIR, by adding more lower-level operations into CacheIR

I’ll have to do more research and check my assumptions before making a decision. A bigger question I’m wondering about is how to determine if it’s worth it to implement a particular operation in CacheIR at all; maybe I’m going about things the wrong way by adding the record/tuple opcodes into CacheIR right away, and instead I should just be implementing code generation and defer anything else until benchmarks exist?

Tuple prototype methods ☑️

Tuesday, January 18th, 2022

It’s been a little while. But since my last post, Nicolò’s patches implementing records and tuples in SpiderMonkey landed, which means that I was able to submit my patches implementing the prototype methods for the Tuple type. That code is awaiting review.

Some initial review comments that I received focused on concerns about changing existing code that implements Arrays. Immutable and mutable sequences support a lot of the same operations, so it made sense to re-use existing Array code in SpiderMonkey rather than reinventing the wheel. Section 8.2.3 of the Records nad Tuples proposal lists the tuple prototype methods and, for many of them, specifies that the behavior should be the same as what’s already specified for the equivalent Array method.

In some cases, code reuse required changing existing code. Built-in JavaScript types can have their methods implemented in SpiderMonkey in two different ways: self-hosted (implemented in JavaScript as a library) or native (implemented inside the compiler, in C++). I implemented most of the tuple methods as self-hosted methods, which can easily call existing self-hosted methods for arrays. In some cases, I thought a native implementation would be more efficient: for example, for the toReversed() method, which returns a reversed copy of a given tuple. (Since tuples are immutable, it can’t reverse in-place.) There’s already an efficient C++ implementation of reverse for arrays, and because of how tuples are currently implemented in SpiderMonkey (using the NativeObject C++ type, which is the same underlying type that represents arrays and most other built-in types), making it work for tuples as well just required changing the method’s precondition, not the implementation. However, the maintainers were reluctant to allow any changes to the built-in array methods. I replaced that with a straightforward self-hosted implementation of toReversed(); it seems a shame to not take advantage of existing optimized code, but perhaps I was falling prey to premature optimization. We don’t have any performance benchmarks for code that uses tuples yet, and without profiling, it’s impossible to know what will actually be performance-critical.

While I await further review, I’ll be learning about WarpMonkey, the next tier of optimizing JIT in SpiderMonkey. The work I described in my previous posts was all about implementing records and tuples in the baseline compiler, and that work is finished (I’ll hold off on submitting it for review until after the prototype methods code has been accepted). I expect this to be where the real fun is!

Fun with pointer arithmetic

Thursday, December 9th, 2021

a diagram showing the layout of the NativeObject and RecordType typesA picture is worth a thousand words, they say.

Where I left off yesterday, I was trying to figure out why my generated AddRecordProperty code was crashing. I was still using the simplest possible test case, a record with one field:

function f() { x = #{"a": 1}; }

Fixed slots

My code was writing a literal zero into the record’s “initialized length” slot:

store32(Imm32(0), Address(result, NativeObject::getFixedSlotOffset(RecordType::INITIALIZED_LENGTH_SLOT)));

But this should have been:

  storeValue(Int32Value(0), Address(result, NativeObject::getFixedSlotOffset(RecordType::INITIALIZED_LENGTH_SLOT)));

In the drawing, I indicated that offset 24 of a RecordType is a Value (JS::Value) denoting the number of elements that have been initialized so far. While it’s an invariant that this will actually be an integer value, as far as the compiler is concerned, the representation is of a Value, which has a different bit pattern from the integer 0.

Some existing code in RecordType::createUninitializedRecord() (this is code that isn’t upstream yet) should have been a clue:

  uint32_t length = getFixedSlot(INITIALIZED_LENGTH_SLOT).toPrivateUint32();

To get an unsigned int32, we call the Value method toPrivateUint32(), which returns an integer when called on an integer value.

Moreover, the getFixedSlot() method of NativeObject also returns a Value, which should have been a pretty good hint to me that fixed slots are Values:

const Value& getFixedSlot(uint32_t slot) const;

(NativeObject.h)

Observing the length field

Supposing that the register %rcx points to a record, I would like to be able to execute:

call js::DumpValue(((RecordType*) $rcx)->getFixedSlot(INITIALIZED_LENGTH_SLOT))

in gdb. (Where INITIALIZED_LENGTH_SLOT is defined as 0, since it happens to be the first fixed slot in this object.) Casting the value in %rcx to RecordType is necessary to tell gdb where the struct fields begin and end, but from there, I would have thought there would be enough debug information for it to know that RecordType inherits from NativeObject, which has a getFixedSlot() method.

Since I can’t do that, the next best thing is:

(gdb) call js::DumpValue( (JS::Value::fromRawBits (*($rcx + 24)) ))
0

And that works — it prints 0, which is what I would expect for a record with no initialized fields. Effectively, I inlined getFixedSlot(), which accesses offset 24 from the object. Then, JS::Value::fromRawBits decodes the tagged pointer that represents a Value, and DumpValue() pretty-prints it.

Observing the sortedKeys field

Looking at the picture again, records have a second fixed slot that’s a Value that is guaranteed (assuming the compiler works) to correspond to an ArrayObject, which just contains the record keys, in sorted order. I knew that my code was temporarily storing the value of this slot in register %rbx (as before, I figured this out by putting breakpoints in the code generation methods and looking at the values of various variables), so if I do:

call js::DumpValue ((JS::Value::fromRawBits($rbx)))

in gdb, I get output that’s something like <Array object at 232d468007f8>

But for more detail, I can do:

(gdb) call js::DumpObject (& ((JS::Value::fromRawBits($rbx)).toObject()))
object 29574ae007f0
  global 21bd66c40030 [global]
  class 5555590a8770 Array
  shape 21bd66c66320
  flags:
  proto 
  properties:
    [Latin 1]"length" (map 21bd66c62670/0 writable )

which interprets the Value as an Object and uses DumpObject to print out more details about its representation.

Observing the record itself

Having seen that the individual fixed slots of the record seemed to be correct, I wanted to debug my generated code for creating uninitialized records to see what the entire record object looked like. Knowing that the record was stored in %rcx, I figured out that I could do:

(gdb) call js::DumpObject (& ((JS::Value::fromRawBits($rcx)).toExtendedPrimitive()))
object 29574ae007b0
  global 21bd66c40030 [global]
  class 5555590b7d10 record
  shape 21bd66c663c0
  flags:
  proto null
  reserved slots:
      0 : 0
      1 : 
      2 : false
  properties:
  elements:
      0: false
      1: Assertion failure: (asBits_ & js::gc::CellAlignMask) == 0 (GC pointer is not aligned. Is this memory corruption?), at /home/tjc/gecko-fork/obj-x64-debug/dist/include/js/Value.h:622

“Extended primitive” is a provisional name for records and tuples, which are not objects, but are (in our prototype implementation) represented internally in the compiler as objects; that’s why I’m able to use DumpObject to print out the fields. Under “reserved slots”, it’s showing me the values of the three reserved slots shown at offsets, 24, 32, and 40 in the picture above.

Obviously, it’s a warning sign that trying to print out the elements array causes an assertion failure. I would like to be able to print it out using:

call js::DumpValue(((RecordType*) $rcx)->getElements()[0])

since getElements() is a NativeObject method that represents an array of Values. But this doesn’t work in gdb, so knowing that the offset of the elements_ field is 16, I did the following:

(gdb) p *(ObjectElements::fromElements((HeapSlot*) ($rcx + 16)))
$10 = { flags = 3952501696,  initializedLength = 13525, capacity = 1437286072, length = 21845}
    (gdb) 

I deleted some of the output so as to show only the dynamic fields. The picture above makes this much clearer, but once we access the HeapSlot array stored in the elements_ field, we can call the ObjectElements::fromElements method on it to access the elements header, which is stored physically before the array itself. This header consists of four int32 fields: flags, initializedLength, capacity, and length. This output makes it seem like garbage got written into the object, since the initialized length shouldn’t be 13525.

Looking at the slots_ array of the record (which is at offset 8), I observed:

p ((HeapSlot*) (*($rcx + 8)))
$22 = (js::HeapSlot *) 0x55ab3eb8
(gdb) 

This coresponds to this MacroAssembler call:

  storePtr(ImmPtr(emptyObjectSlots), Address(result, NativeObject::offsetOfSlots()));

result here represents the register that contains the record, and offsetOfSlots() is 8. I had copied/pasted this code from existing code that initializes arrays, without looking at it too carefully, but when I read it again, I noticed that emptyObjectSlots is a special value. A comment in NativeObject.h mentions that special singleton values are used when the elements_ and slots_ arrays are empty.

Initializing the elements

And that’s how I realized that some other code that I had copied and pasted out of the existing array initialization code was misplaced:

  // Initialize elements pointer for fixed (inline) elements.
  computeEffectiveAddress(
      Address(result, NativeObject::offsetOfFixedElements()), temp);
  storePtr(temp, Address(result, NativeObject::offsetOfElements()));

This works fine for what it does, but I needed to do the equivalent of the NativeObject::setEmptyElements() method:

Should do exactly what NativeObject::setEmptyElements() does:

  void setEmptyElements() { elements_ = emptyObjectElements; }

(code)

So what I really wanted was:

  // Initialize elements pointer
  storePtr(ImmPtr(emptyObjectElements),
           Address(result, NativeObject::offsetOfElements()));
                     

And after making that change, the elements header looked a lot better:

(gdb) p (*((ObjectElements*) ($rcx)))
$8 = {flags = 0, initializedLength = 0, capacity = 1, length = 1, static VALUES_PER_HEADER = 2}
(gdb)  call ((JSObject*) $rcx)->dump()
object 17d67b3007b0
  global 2689ff340030 [global]
  class 5555590b7b80 record
  shape 2689ff3663c0
  flags:
  proto null
  reserved slots:
      0 : 0
      1 : 
      2 : false
  properties:
    (gdb) 

There’s more to recount, but I don’t have any more time today.

Observability

Wednesday, December 8th, 2021

I’m realizing that this blog is mostly about debugging, since figuring out how to debug SpiderMonkey is taking up most of my time. Right now, the actual compilation algorithms I’m implementing are straightforward; if I was working on performance tuning an existing implementation or something, I’m sure it would be different. What’s actually hard is finding ways to make generated code more observable. I’m sure that more experienced SpiderMonkey hackers have their own techniques, but since there isn’t a lot of documentation to go on, I’m making it up mostly from scratch.

Having gotten empty records to work, I started working on code generation for the AddRecordProperty opcode, so we can have non-empty records. When I initially tested my code with input x = #{"a": 1}, I got an assertion failure:

Assertion failure: !this->has(reg), at /home/tjc/gecko-fork/js/src/jit/RegisterSets.h:680

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555580486c9 in js::jit::SpecializedRegSet<js::jit::LiveSetAccessors<js::jit::TypedRegisterSet >, js::jit::TypedRegisterSet >::add (
    this=0x7fffffffc768, reg=...) at /home/tjc/gecko-fork/js/src/jit/RegisterSets.h:680
680     MOZ_ASSERT(!this->has(reg));

AddRecordProperty takes three operands (the record, the field name, and the value for the field). When I dug into it, it turned out that two of the three operands had been allocated to the same register, which wasn’t right. The CacheIRCompiler module handles register allocation, spilling registers to the stack when necessary, so you can implement new operations without having to worry about those details. But in this case, I had to worry about that detail. Internally, the register allocator keeps a list, operandLocations_ that maps integers (operands are numbered from 0 onward) to physical registers. It turned out that operands 0 and 2 were both allocated to %rax, which I figured out in gdb (irrelevant details snipped):

(gdb) p operandLocations_[0]
$13 = (js::jit::OperandLocation &) @0x7fffffffc680: {
  kind_ = js::jit::OperandLocation::PayloadReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rax, 
                ...
(gdb) p operandLocations_[1]
$14 = (js::jit::OperandLocation &) @0x7fffffffc690: {
  kind_ = js::jit::OperandLocation::ValueReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rbx, 
                ...
(gdb) p operandLocations_[2]
$15 = (js::jit::OperandLocation &) @0x7fffffffc6a0: {
  kind_ = js::jit::OperandLocation::ValueReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rax,  

The BaselineCacheIRCompiler::init() method (code) is called each time a new opcode is compiled, and it assigns a virtual register for each operand. I compared the code I’d added for AddRecordProperty with the code for the other opcodes that also have 3 operands (such as SetElem), I noticed that all of them passed the third operand on the stack. When I changed AddRecordProperty to work the same way, the problem was fixed. I don’t know how I would have figured this out except for a lucky guess! I’m guessing it’s this way because of the small number of registers on x86, but the internal error-checking here seems lacking; it doesn’t seem good to silently alias distinct operands to each othre.

Once I changed the init() method and modified my emitAddRecordPropertyResult() method to expect the third operand on the stack, I found that for the compiled code to be run and not just emitted, I had to embed the code in a function once again:
function f() { x = #{"a":1}; }

(I still don’t understand the details as to when the JIT decides to generate native code.)

The result was a segfault, which was okay for the time being; it means that my code was at least able to generate code for setting a record property, without internal errors.

I realized that I wasn’t using the protocol for calling from the JIT into C++ functions (the callVM() method in the baseline compiler) correctly, but took a break before fixing it.

After taking a break, I fixed that problem only to run into another one: callVM() doesn’t automatically save registers. The macro assembler provides another method, callWithABI() (code), that makes it easier to save and restore registers. Evidently, ABI calls are less efficient than VM calls, but I’m not worried about performance at the moment. I switched it around to use callWithABI() and had to do some debugging by setting breakpoints both in the C++ code for the callee function and in the generated assembly code to make sure all the data types matched up. That worked reasonably well, but alas, I had to switch back to callVM() because I couldn’t figure out how to pass a HandleObject (see the comment at the beginning of RootingAPI.h for a fairly good explanation) into an ABI function. I added code to manually save and restore registers.

That worked okay enough, and my generated code got almost all the way through without a segfault. At the very end, my code calls the existing emitStoreDenseElement method. Internally (in the current implementation; this is all subject to change), records contain a sorted array of keys. Separately from storing the actual value at the correct offset in the record, the key has to be added to the array. (Since records are immutable, the actual sorting only has to happen once, when the FinishRecord opcode executes.)

I was getting a segfault because there was something wrong about the array that I was passing into the generated code for storing dense elements. I haven’t found the problem yet, but (going back to the beginning of this post), I’m trying to find ways to make the generated code more transparent. In this case, I knew that the emitStoreDenseElement code was getting its array input in the %rsi register, so I put a breakpoint in the right place and looked at it in gdb:


Thread 1 "js" received signal SIGTRAP, Trace/breakpoint trap.
0x00001dec1e58ab70 in ?? ()

(gdb) p *((JSObject*) $rsi)
$3 = {<js::gc::CellWithTenuredGCPointer> = { = {
      header_ = {<mozilla::detail::AtomicBaseIncDec> = {<mozilla::detail::AtomicBase> = {
            mValue = {<std::__atomic_base> = {static _S_alignment = 8, _M_i = 6046507808}, 
              static is_always_lock_free = true}}, }, }, static RESERVED_MASK = 7, static FORWARD_BIT = 1}, }, 
  static TraceKind = JS::TraceKind::Object, static MAX_BYTE_SIZE = 152}
(gdb) p *((JSObject*) $rsi).getHeader()
Attempt to take address of value not located in memory.
(gdb) p ((JSObject*) $rsi)->header_
$4 = {<mozilla::detail::AtomicBaseIncDec> = {<mozilla::detail::AtomicBase> = {
      mValue = {<std::__atomic_base> = {static _S_alignment = 8, _M_i = 6046507808}, 
        static is_always_lock_free = true}}, }, }
(gdb) p ((JSObject*) $rsi)->slots_
There is no member or method named slots_.
(gdb) p ((js::HeapSlot*) ($rsi + 8))
$5 = (js::HeapSlot *) 0x3a6c5de007f8
(gdb) p ((js::HeapSlot*) ($rsi + 8))[0]
$6 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 93824997702264}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 8))[1]
$7 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 64237105842200}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 8))[2]
$8 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 0}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 16))[0]
$9 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 64237105842200}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 16))[0].value
$10 = {asBits_ = 64237105842200}
Slot*) ($rsi + 16))))
No symbol "ObjectElements" in current context.
(gdb) p (js::ObjectElements::fromElements (((js::HeapSlot*) ($rsi + 16))))
$11 = (js::ObjectElements *) 0x3a6c5de007f0
(gdb) p (js::ObjectElements::fromElements (((js::HeapSlot*) ($rsi + 16))))->initializedLength
$12 = 1
(gdb) 

I’m posting this debugging transcript to show some of the things I’ve discovered so far. In gdb, you can cast a pointer to an arbitrary type and use that to print out the contents according to the physical layout of that type; hence, printing ((JSObject*) $rsi. Weirdly, the NativeObject class (code) seems to be inaccessible in gdb; I’m not sure if this is because NativeObject inherits from JSObject (in this case, I know that the object pointed to by %rsi is a NativeObject, or because of something different about that class. (None of NativeObject, js::NativeObject, or JS::NativeObject seem to be names that gdb recognizes.) That makes it harder to print some of the other fields, but the debugger does know about the header_ field, since it’s defined in the Cell class (code) that JSObject inherits from. The getHeader() method is just supposed to return the header_ field, so I’m not sure why gdb reported the “Attempt to take address of value not located in memory” error.

I knew that NativeObjects have a slots_ field and an elements_ field, physically laid out right after the header_ field, but due to the problem I mentioned with the NativeObject class, gdb won’t recognize these field names. I knew that slots_ would be at offset 8 and elements would be at offset 16, so using some pointer math and casting the pointers to the type that both slots_ and elements_ have (js::HeapSlot*; the HeapSlot class is defined here), I was able to make gdb treat these addresses as HeapSlot arrays. HeapSlot is basically an alias for the Value type, which represents all runtime values in SpiderMonkey (see Value.h).

The slots_ array contains “fixed slots” (attributes, basically); this object’s slots array has two elements that are initialized (the one at index 2 is printed as if its asBits_ field is 0, meaning it’s just a 0 and is probably uninitialized). For array-like objects (which this one is — if you’ve lost track, it’s the array of sorted record keys that’s part of a record), the elements_ array is an array of, well, elements. NativeObject has a getElements method, which casts the HeapSlot array into an ObjectElements object (which has some additional methods on it), but I can’t call any NativeObject methods from gdb, so instead I called the ObjectElements::fromElements method directly to cast it. Since the initializedLength() method on the resulting object returns 1, which is correct (the previous generated code sets the initialized length to 1, since we’re adding a property to a previously empty record), it seems like the layout is vaguely correct.

That’s as far as I got. Debugging tools for generated code that don’t require all these manual casts and have more access to type information that would make pretty-printing easier would be great! Maybe that exists and I haven’t found it, or maybe it’s just that compiler implementors are a small enough group that making tools more usable isn’t too compelling for anyone. In any case, the hard part about all of this is finding good ways to observe data at runtime and relate it to types that exist in the compiler.

Further adventures in gdb

Tuesday, December 7th, 2021

So, that updating-every-day plan hasn’t exactly worked out. Working on the code is so much fun that it’s hard to stop, and then I have to choose between going to bed and writing a blog post.

I’m currently working on generating code for the three opcodes related to records: AddRecord, FinishRecord, and AddRecordProperty. The work I wrote about in previous posts wasn’t exactly complete, because it allowed code generation in the presence of records, but I just plugged in no-op implementations for the tryAttachStub() methods in CacheIR.cpp that actually do the work. I still had to change some code generation methods so that existing code could work with records and tuples, but the really fun stuff is what I’m doing now!

Rather than try to cover everything, I’ll summarize my notes from the end of last week. I had added two methods to BaselineCacheIRCompiler.cpp, which is the part of the baseline compiler that generates baseline code from CacheIR code. As an initial test case, I was using a function that just returns an empty record literal:

function f() { x = #{}; }

since I hadn’t yet implemented code generation for adding record properties. The JavaScript bytecode for this function looks like:

loc     op
-----   --
main:
00000:  BindGName "x"                   # GLOBAL
00005:  InitRecord 0                    # GLOBAL 
00010:  FinishRecord                    # GLOBAL 
00011:  SetGName "x"                    # 
00016:  Pop                             # 
00017:  RetRval                         # 

The InitRecord 0 instruction creates an uninitialized record of length 0, and the FinishRecord instruction sets internal flags to make the record read-only. In a non-empty record, one or more AddRecordProperty instructions would appear in between; it would be a dynamic error to invoke another AddRecordProperty instruction again after the FinishRecord.

The bytecode above is from executing dis(f) in the interpreter; I also discovered that you can do disnative(f) after CacheIR and then native code are generated for f. This prints out the generated assembly code; it’s actually much easier to read if you execute print(disnative(f)). Invoking the interpreter with the environment variable IONFLAGS=codegen also prints out the code as it’s generated, which is sometimes more useful since it can be interleaved with other debug messages that show which methods are generating which bits of code.

I was getting a segfault, meaning that the code being generated by my emitNewRecordResult() method didn’t match the expectations that the code generated by my emitFinishRecordResult() method had about the data. So I fired up gdb, as I had done before. Trying to zero in on the problem, I was looking at some generated code that took its input in the %rax register (gdb uses the $rax syntax instead.) With careful use of casts and calling accessor methods, it’s possible to look at the C++ representations of the objects that generated code creates. I found it helpful to look at the value of this expression:

(gdb) p (*(*((JSObject*) $rax)).shape()).getObjectClass()
$9 = (const JSClass *) 0x5555590a6130 

From context, I was able to figure out that 0x5555590a6130 was the address of the JSClass structure for the ArrayObject type. This wasn’t what I expected, since the input for this code was supposed to be a record (with type RecordType in C++).

Since this was a few days ago, I’ve lost track of the logical steps in between, but eventually, I put a breakpoint in the createUninitializedRecord() method that I’d added to MacroAssembler.cpp; shape and arrayShape are variable names referring to two registers I allocated within that method.

Thread 1 "js" hit Breakpoint 1, js::jit::MacroAssembler::createUninitializedRecord (
    this=0x7fffffffae00, result=..., shape=..., arrayShape=..., temp=..., temp2=..., 
    recordLength=0, allocKind=js::gc::AllocKind::OBJECT4_BACKGROUND, 
    initialHeap=js::gc::DefaultHeap, fail=0x7fffffffaa38, allocSite=...)
    at /home/tjc/gecko-fork/js/src/jit/MacroAssembler.cpp:493
warning: Source file is more recent than executable.
493   allocateObject(result, temp, allocKind, 0, initialHeap, fail, allocSite);
(gdb) p shape
$10 = {reg_ = js::jit::X86Encoding::rcx, static DefaultType = js::jit::RegTypeName::GPR}
(gdb) p arrayShape
$11 = {reg_ = js::jit::X86Encoding::rcx, static DefaultType = js::jit::RegTypeName::GPR}
(gdb) quit

This debug output shows that the register allocator assigned both shape and arrayShape to the same register, %rcx. As a result, the code I generated to initialize the shapes was using the same shape for both the record object itself, and the internal sortedKeys array pointed to by the record’s sortedKeys field — causing createUninitializedRecord() to return something that looked like an array rather than a record.

The reason this was happening was the following code that I wrote:

 
AutoScratchRegisterMaybeOutput shape(allocator, masm, output);
AutoScratchRegisterMaybeOutput arrayShape(allocator, masm, output);

The AutoScratchRegisterMaybeOutput class (defined in CacheIRCompiler.h) re-uses the output register if possible, so calling it twice allocates both symbolic names to the same register. That wasn’t what I wanted.

The solution was to use the AutoScratchRegister class instead, telling the register allocator that shape and arrayShape should live in distinct registers. After making that change, repeating the original experiment gives:

(gdb) p (*(*((JSObject*) $rcx)).shape()).getObjectClass()
$5 = (const JSClass *) 0x5555590b5690 
(gdb) c
Continuing....
(gdb) p (*(*((JSObject*) $rbx)).shape()).getObjectClass()
$8 = (const JSClass *) 0x5555590a60f0 
(gdb) 

(To be sure that $rcx and $rbx were the registers allocated to shape and arrayShape, I had to set a breakpoint in my createUninitializedRecord() method again and look at the values of these variables.)

I hope this might be marginally useful to somebody trying to debug SpiderMonkey baseline code generation, or if nothing else, useful to me if I ever have to set aside this work for a few months (let’s be honest, weeks) and come back to it!

The emotional roller coaster that is programming

Friday, November 19th, 2021

I skipped a few days’ worth of updates; turns out it’s a bit difficult to fit in time to write an entire post when your work schedule isn’t very consistent.

In the meantime, I finished implementing all the record and tuple opcodes in the JIT. Having done some manual testing, it was time to start running the existing test suite with the compiler enabled. Fortunately, I figured out the flag to pass in so that I wouldn’t have to add it to each test file by hand (which would have been bad practice anyway):

mach jstests Record --args=--baseline-eager

This runs all the tests with Record in the name and adds the --baseline-eager flag in to the JavaScript shell.

At this stage, failures are good — it means there’s still something interesting left to work on. Yay, a failure!

Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
REGRESSION - non262/Record/equality.js
[7|1|0|0] 100% ======================================================>|   1.1s
REGRESSIONS
    non262/Record/equality.js
FAIL
 

Narrowing down the code that caused the failure, I got:

js> Object.is(#{x: +0}, #{x: -0})
Object.is(withPosZ, withNegZ)
Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555583763f8 in js::jit::CallIRGenerator::tryAttachObjectIs (this=0x7fffffffcd10, callee=...)
    at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
7745            MOZ_CRASH("Unexpected type");
(gdb) 

So this told me that I hadn’t yet implemented the cases for comparing records/tuples/boxes to each other in Object.is() in the JIT.

Fixing the problem seemed straightforward. I found the CallIRGenerator::tryAttachObjectIs() method in CacheIR.cpp. The CallIRGenerator stub takes care of generating code for built-in methods as they’re called; each time a known method is called on a particular combination of operand types that’s implemented in the baseline compiler, code gets generated that will be called next time instead of either interpreting the code or re-generating it from scratch.

For example, this code snippet from tryAttachObjectIs() shows that the first time Object.is() is called with two int32 operands, the compiler will generate a version of Object.is() that’s specialized to this case and saves the need to call a more generic method and do more type checks. Of course, the generated code has to include a check that the operand types actually are int32, and either call a different generated method or generate a new stub (specialized version of the method) if not.

    MOZ_ASSERT(lhs.type() == rhs.type());
    MOZ_ASSERT(lhs.type() != JS::ValueType::Double);

    switch (lhs.type()) {
      case JS::ValueType::Int32: {
        Int32OperandId lhsIntId = writer.guardToInt32(lhsId);
        Int32OperandId rhsIntId = writer.guardToInt32(rhsId);
        writer.compareInt32Result(JSOp::StrictEq, lhsIntId, rhsIntId);
        break;
      }

The existing code handles cases where both arguments have type Int32, String, Symbol, Object, et al. So it was easy to follow that structure and add a case where both operands have a box, record, or tuple type. After a fun adventure through the MacroAssembler, I had all the pieces implemented and the test passed; I was able to apply Object.is() to records (etc.) with the baseline compiler enabled.

After that, all the tests for records passed, which isn’t too surprising since there aren’t many methods for records. Next, I tried running the tests for what’s currently called Box in the Records and Tuples proposal (subject to change), and got more failures; still a good thing.

mach-with record-tuple-with-jit jstests Box --args=--baseline-eager
[1|0|0|0]  20% ==========>                                            |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/unbox.js
[1|1|0|0]  40% =====================>                                 |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/json.js
[1|2|0|0]  60% ================================>                      |   1.3s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/constructor.js
[2|3|0|0] 100% ======================================================>|   1.3s
REGRESSIONS
    non262/Box/unbox.js
    non262/Box/json.js
    non262/Box/constructor.js
FAIL

The common cause: generating code for any method calls on Boxes invokes GetPropIRGenerator::tryAttachPrimitive() (also in CacheIR.cpp as above), which didn’t have a case for records/tuples/boxes. (In JavaScript, a method is just another property on an object; so the GetProp bytecode operation extracts the property, and calling it is a separate instruction.) Similarly to the above, I added a case, and the code worked; I was able to successfully call (Box({}).unbox()) with the compiler enabled.

The next test failure, in json.js, was harder. I minimized the test case to one line, but wasn’t able to get it any simpler than this:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

This code calls the JSON.stringify() standard library method on the value Box({}) (a box wrapped around an empty object); the second argument is a function that’s applied to the value of each property in the structure before converting it to a string. The fix I made that fixed unbox.js got rid of the MOZ_CRASH(unexpected type) failure, but replaced it with a segfault.

It took me too many hours to figure out that I had made the mistake of copying/pasting code without fully understanding it. The cached method stubs rely on “guards”, which is to say, runtime type checks, to ensure that we only call a previously-generated method in the future if the types of the operands match the ones from the past (when we generated the code for this particular specialization of the method). When making the change for Object.is(), I had looked at CacheIRCompiler.cpp and noticed that the CacheIRCompiler::emitGuardToObject() method generates code that tests whether an operand is an object or not:

bool CacheIRCompiler::emitGuardToObject(ValOperandId inputId) {
  JitSpew(JitSpew_Codegen, "%s", __FUNCTION__);
  if (allocator.knownType(inputId) == JSVAL_TYPE_OBJECT) {
    return true;
  }

  ValueOperand input = allocator.useValueRegister(masm, inputId);
  FailurePath* failure;
  if (!addFailurePath(&failure)) {
    return false;
  }
  masm.branchTestObject(Assembler::NotEqual, input, failure->label());
  return true;
}

The generated code contains a “failure” label that this code branches to when the operand inputId is not an object. (It’s up to the caller to put appropriate code under the “failure” label so that this result will be handled however the caller wants.) I copied and pasted this code to create an emitGuardToExtendedPrimitive() method (“extended primitives” are what we’re calling records/tuples/boxes for now), and changed JSVAL_TYPE_OBJECT to JSVAL_TYPE_EXTENDED_PRIMITIVE so that the code would check for the “extended primitive” runtime type tag instead of the “object” type tag. The problem is that I also needed to use something else instead of branchTestObject. As it was, whenever a stub that expects a record/tuple/box as an argument was generated, it would be re-used for operands that are objects. This is obviously unsound and, looking at the failing test case again, we can see why this code exposed the bug:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

The first time the (key, value) anonymous function is called, the name value is bound to Box({}). So a stub gets generated that’s a version of the typeof operation, specialized to Box things (actually anything that’s a record, tuple, or box, for implementation-specific reasons). The stub checks that the operand is a record/tuple/box, and if so, returns the appropriate type tag string (such as “box”). Except because of the bug that I introduced, this stub got re-used for any object operands. The way that the JSON stringify code works (JSON.cpp), it calls the “replacer” (i.e. the anonymous (key, value) function) on the value of each property — but then, it calls the replacer again on the replaced value. So my generated stub that worked perfectly well for Box({}) was subsequently called on {x: {}}, which has an entirely different type; hence the segfault.

Finding this bug took a long time (partly because I couldn’t figure out how to enable the “CacheIR spew” code that prints out generated CacheIR code, so I was trying to debug generated code without being able to read it…), but I experimented with commenting out various bits of code and eventually deduced that typeof was probably the problem; once I read over the code related to typeof, I spotted that my emitGuardToExtendedPrimitive() method was calling branchTestObject(). Adding a branchTestExtendedPrimitive() method to the macro assembler was easy, but tedious, since the code is architecture-specific. It would be nice if dynamic type-testing code was automatically generated, since the code that tests for tags denoting different runtime types is all basically the same. But rather than trying to automate that, I decided it was better to bite the bullet, since I already had enough cognitive load with trying to understand the compiler as it is.

It turned out that the json.js test case, despite being designed to test something else, was perfect for catching this bug, since it involved applying the same method first to a Box and then to an object. Once I’d fixed the problem with guards, this test passed. The constructor.js test still fails, but that just means I’ll have something interesting to work on tomorrow.

Perhaps the swings from despair to elation are why programming can be so habit-forming. While trying to track down the bug, I felt like I was the dullest person in the world and had hit my limit of understanding, and would never make any further progress. When I found the bug, for a moment I felt like I was on top of the world. That’s what keeps us doing it, right? (Besides the money, anyway.)

By the way, I still don’t fully understand inline caching in SpiderMonkey, so other resources, such as “An Inline Cache Isn’t Just A Cache” by Matthew Gaudet, are better sources than my posts. I mean for this blog to be more of a journal of experimentation than a definitive source of facts about anything.

Adventures in gdb

Wednesday, November 10th, 2021

I picked up from yesterday wanting to see what code was being generated for record initialization. A colleague pointed me to a page of SpiderMonkey debugging tips. This was helpful, but required being able to run the JS interpreter inside GDB and type some code into the REPL. The problem is that before it got to that point, the interpreter was trying to compile all the self-hosted code; I knew that this wasn’t going to succeed since I’ve only implemented one of the record/tuple opcodes. I wanted to be able to just do:

> x = #{}

(binding the variable x to an empty record literal) and see the generated code. But because the much-more-complicated self-hosted code has to get compiled first, I never get to that point.

Another colleague suggested looking at the IONFLAGS environment variable. This, in turn, seems to only have an effect if you build the compiler with the --enable-jitspew option. Once I did that, I was able to find out more:

$ IONFLAGS=zzzz mach run
obj-x64-debug/dist/bin/js
found tag: zzzz
Unknown flag.

usage: IONFLAGS=option,option,option,... where options can be:

  aborts        Compilation abort messages
  scripts       Compiled scripts
  mir           MIR information
    ...
    

And so on.

I found that IONFLAGS=codegen mach run would cause the interpreter to print out all the generated assembly code, including all the code for self-hosted methods. This wasn’t entirely helpful, since it was hard to see where the boundaries were between different methods.

I decided to try a different strategy and see what I could do inside gdb. I’ve avoided using debuggers as much as possible throughout my programming career. I’m a fan of printf-style debugging. So much so that I created the printf-style debugging page on Facebook. (This made more sense back when Facebook pages were “fan pages”, so you could be a “fan of” printf-style debugging.) I’ve always had the feeling that any more sophisticated debugging technology wasn’t worth the difficulty of use. Working on a compiler implemented in C++, though, it seems I’m finally having to suck it up and learn.

The first question was how to set a breakpoint on a templated function. I found the rbreak command in gdb, which takes a regular expression. I realized I could also just do:

(gdb) info functions .*emit_InitR.*
All functions matching regular expression ".*emit_InitR.*":

File js/src/jit/BaselineCodeGen.cpp:
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();

File js/src/jit/BaselineIC.cpp:
2454:   bool js::jit::FallbackICCodeCompiler::emit_InitRecord();
(gdb)

So I set a breakpoint on the method I wrote to generate code for the InitRecord opcode:

(gdb) b js::jit::BaselineCodeGen::emit_InitRecord
Breakpoint 1 at 0x555558093884: file /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp, line 2591.
(gdb) b js::jit::FallbackICCodeCompiler::emit_InitRecord
Breakpoint 2 at 0x5555580807b1: file /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp, line 2455.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/tjc/gecko-fork/obj-x64-debug/dist/bin/js 
[snip]

Thread 1 "js" hit Breakpoint 2, js::jit::FallbackICCodeCompiler::emit_InitRecord (this=0x7fffffffd1b0)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:2455
2455      EmitRestoreTailCallReg(masm);
(gdb) 

Finally! At this point, I was hoping to be able to view the code that was being generated for the empty record literal. Stepping through the code from here gave me what I was looking for:

(gdb) s
js::jit::FallbackICCodeCompiler::tailCallVMInternal (
    this=0x7fffffffd1b0, masm=..., 
    id=js::jit::TailCallVMFunctionId::DoInitRecordFallback)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:510
510   TrampolinePtr code = cx->runtime()->jitRuntime()->getVMWrapper(id);
(gdb) n
511   const VMFunctionData& fun = GetVMFunction(id);
(gdb) n
512   MOZ_ASSERT(fun.expectTailCall == TailCall);
(gdb) n
513   uint32_t argSize = fun.explicitStackSlots() * sizeof(void*);
(gdb) n
514   EmitBaselineTailCallVM(code, masm, argSize);
(gdb) n
515   return true;
(gdb) p code
$18 = {value = 0x1e4412b875e0 "H\277"}
(gdb) p code.value
$19 = (uint8_t *) 0x1e4412b875e0 "H\277"
(gdb) x/64i code.value
   0x1e4412b875e0:  movabs $0x7ffff4219000,%rdi
   0x1e4412b875ea:  mov    0x1c0(%rdi),%rax
   0x1e4412b875f1:  mov    %rsp,0x70(%rax)
   0x1e4412b875f5:  movabs $0x55555903de60,%r11
   0x1e4412b875ff:  push   %r11
   0x1e4412b87601:  lea    0x18(%rsp),%r10
   0x1e4412b87606:  movabs $0xfff9800000000000,%r11
     

So that’s the generated code for DoInitRecordFallback (the fallback method implemented in the inline cache module of the baseline compiler), but I realized this wasn’t really what I was hoping to find. I wanted to see the intermediate representation first.

From there, I realized I was barking up the wrong tree, since the baseline compiler just goes straight from JS to assembly; only the more sophisticated compilers (which weren’t being invoked at this point) use MIR and LIR. (A blog post from Matthew Gaudet, “A Beginners Guide To SpiderMonkey’s MacroAssembler”], explains some of the pipeline.)

So at least I knew one way to get to the generated assembly code for one opcode, but it wasn’t particularly helpful. My co-worker suggested putting in no-op implementations for the other opcodes so that it would be able to compile all the self-hosted code (even if the generated code wouldn’t work). This seemed like the fastest way to get to a functioning REPL so I could experiment with simpler code snippets, and it worked. After just adding a no-op emit_ method in BaselineCodeGen.cpp for each opcode, the interpreter was able to start up.

When I typed code into the REPL, I could tell it was only being interpreted, not compiled, since everything still worked, and I would expect anything that used records/tuples except for an empty record literal to fail. I found the --baseline-eager flag with a little bit of digging, and:

obj-x64-debug/dist/bin/js --baseline-eager
js> function f() { return #{}; }
function f() { return #{}; }
js> f()
f()
Assertion failure: !BytecodeOpHasIC(op) (Missing entry in OpToFallbackKindTable for JOF_IC op), at js/src/jit/BaselineIC.cpp:353
Segmentation fault
$

Excellent! This pointed to something I didn’t change yesterday (since the compiler didn’t make me) — I had to update the OpToFallbackKindTable in BaselineIC.cpp.

Once I did that, I realized that I couldn’t get very far with just InitRecord, since I wouldn’t expect even the empty record to compile without being able to compile the FinishRecord opcode. (Since records are immutable, Nicolò’s implementation adds three opcodes for creating records: one to initialize the empty record, one to add a new record field, and one to finish initialization, the last of which marks the record as immutable so that no more fields can be added.)

So I implemented FinishRecord, similarly to the work from yesterday. Now what? I was able to type in an empty record literal without errors:

> x = #{}
#{}

But how do I know that x is bound to a well-formed record that satisfies its interface? There’s not too much you can do with an empty record. I decided to check that typeof(x) worked (it should return “record”), and got an assertion failure in the emitGuardNonDoubleType() method in CacheIRCompiler.cpp). It took me some time to make sense of various calls through generated code, but the issue was the TypeOfIRGenerator::tryAttachStub() method in CacheIR.cpp:

AttachDecision TypeOfIRGenerator::tryAttachStub() {
[...snip...]
  TRY_ATTACH(tryAttachPrimitive(valId));
  TRY_ATTACH(tryAttachObject(valId));

  MOZ_ASSERT_UNREACHABLE("Failed to attach TypeOf");
  return AttachDecision::NoAction;
    }
    

This code decides, based on the type of the operand (valId) whether to use the typeOf code for primitives or for objects. The record/tuple implementation adds “object primitives”, which share some qualities with objects but aren’t objects (since, among other things, objects are mutable). The tryAttachPrimitive() call was successfully selecting the typeOf code for primitives, since the isPrimitive() method on the Value type returns true for object primitives. Because there was no explicit case in the code for records, the code for double values was getting called as a fallback and that’s where the assertion failure was coming from. Tracking this down took much more time than actually implementing typeOf for records, which I proceeded to do. And now I can get the type of a record-valued variable in compiled code:

js> x = #{}
    #{}
js> typeof(x)
"record"

This provides at least some evidence that the code I’m generating is laying out records properly. Next up, I’ll try implementing the opcode that adds record properties, so that I can test out non-empty records!

Adding record and tuple support to the JIT

Tuesday, November 9th, 2021

Today I started working on implementing the Record and Tuples proposal for JavaScript in the JIT in SpiderMonkey. All of this work is building on code written by Nicolò Ribaudo, which isn’t merged into SpiderMonkey yet but can be seen in patches linked from the Bugzilla bug.

Up until now, SpiderMonkey would automatically disable the JIT if you built it with the compile-time flag that enables records and tuples. Currently, the interpreter implements records and tuples, but not the compiler. I started by searching through the code to figure out how to re-enable the JIT, but realized it would be faster to look through the commit history, and found it in js/moz.configure. (If you try to follow along, you won’t be able to see some of the code I’m referring to since it’s in unapplied patches, but I’m including some links anyway to give context.)

I saw that if I just pass in the --enable-jit build flag explicitly, it should override what the config file said, and it indeed did. I decided to operate on the assumption that the compiler error messages would tell me what I needed to implement, which isn’t always a safe assumption when working in C/C++, but seems to have served me okay in my SpiderMonkey work so far.

The first set of compiler errors I got had to do with adding the IsTuple() built-in method to the LIR. (The MIR and LIR, two of the intermediate languages used in SpiderMonkey, are explained briefly on the SpiderMonkey documentation page.) This involved implementing EmitObjectIsTuple() and visitIsTuple methods in CodeGenerator.cpp, part of the baseline compiler (the documentation also explains the various compilers that make up the JIT). That was straightforward, since IsTuple() is just a predicate that returns true for tuple arguments and false for arguments of any other type. When I implemented this method before, I chose to implement it as a JS_INLINABLE_FN, not knowing what I was getting myself into. With JIT disabled at compile time, the compiler made me implement it down to the MIR level, but now I had to implement it in LIR.

Once that was done, I ran the interpreter and got an assertion failure: "Hit MOZ_CRASH(Record and Tuple are not supported by jit) at gecko-fork/js/src/jit/BaselineCodeGen.cpp:2589". This was excellent, since it told me exactly where to start. When I looked at BaselineCodeGen.cpp, I saw that the seven opcodes for records and tuples were all defined with the UNSUPPORTED_OPCODE macro, so I planned to proceed by removing each of the UNSUPPORTED_OPCODE calls one-by-one and seeing what that forced me to implement.

I started with the InitRecord opcode, which as you might guess, creates a new record with a specified number of fields. As a strategy, I followed the pattern for the existing NewArray and NewObject opcodes, since creating new arrays and objects is similar to creating new records.

By following the error messages, I found the files that I needed to change; I’m putting this list in logical order rather than in the order that the compile errors came up, which was quite different.

  • VMFunctionList-inl.h — added the RecordType::createUninitialized C++ function to the list of functions that can be called from the JIT
  • VMFunctions.h — added a TypeToDataType case for the RecordType C++ type
  • BaselineCodeGen.cpp, where I added an emit method for InitRecord
  • BaselineIC.cpp, and CacheIR.cpp, where I added code to support inline caching (explained here) for InitRecord.
  • MIROps.yaml, the file that defines all MIR opcodes; a lot of other code is automatically generated from this file. I had to add a new InitRecord opcode.
  • MIR.cpp — MInitRecord methods
  • MIR.h, where I had to define a new MInitRecord class, and MIR.cpp, where I had to implement the class.
  • Lowering.cpp, where I added code to translate the MIR representation for an InitRecord call to LIR.
  • LIROps.yaml, similarly to MIROps.yaml.
  • CodeGenerator.cpp, where I added the visitInitRecord method that translates the LIR code to assembly.
  • Recover.cpp — while I don’t understand this code very well, I think it’s what implements the “bailout” mechanism described in the docs. Similarly to the other modules, I had to add methods for InitRecord and a new class to the accompanying header file.

I love compiler errors! Without static typechecking, I wouldn’t have any information about what parts of the code I needed to change to add a new feature. As a functional programmer, I normally don’t give C++ a lot of credit for static typechecking, but whether it’s about modern language features or the coding style used in SpiderMonkey (or both), I actually find that I get a lot of helpful information from type error messages when working on SpiderMonkey. Without static type errors, I would have had to understand the JIT from the top down to know what parts I needed to change, maybe by reading through the code (slow and tedious) or maybe by reading through documentation (likely to be out of date). Types are documentation that can’t fall out of date, since the compiler won’t generate code for you if you give it something that doesn’t typecheck.

Once everything compiled and I started the interpreter again, I got a different assertion failure:

"Assertion failure: BytecodeOpHasIC(op), at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:649"

This pointed to the final change, in BytecodeLocation.h. I had added the code for inline caching, but hadn’t updated the opcode table defined in this file to indicate that the InitRecord opcode had an inline cache. Since the relationship between this table and the code itself exists only in the programmers’ heads, there’s no way for the compiler to check this for us.

Once I fixed this and started the interpreter again, I got a new error:

Hit MOZ_CRASH(Record and Tuple are not supported by jit) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604
Thread 1 "js" received signal SIGSEGV, Segmentation fault. 0x000055555809ce62 in js::jit::BaselineCodeGen::emit_AddRecordProperty (this=0x7fffffffd080) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604 2604 UNSUPPORTED_OPCODE(AddRecordProperty)

This is just saying that AddRecordProperty is an unsupported opcode, which is what I would expect since I only implemented one of the record/tuple opcodes. So that means that after my changes, SpiderMonkey was able to generate code for the InitRecord opcode. (The reason why these errors showed up as soon as I launched the interpreter, without having to execute any code, is that at startup time with JIT enabled, the interpreter compiles all the self-hosted libraries, which are implemented in JavaScript. Since on my working branch, there is library code that uses the Record and Tuple types, that means that the code path leading to those UNSUPPORTED_OPCODES was guaranteed to be reached.)

So what do I know now? The JIT seems to be able to generate code for the InitRecord opcode, at least for the first occurrence of it in the self-hosted libraries. Whether that code works (that is, implements the semantics in the spec) is a separate question. To know the answer, I would have to look at the generated code — I won’t be able to actually test any code in the interpreter until I implement all the opcodes, since each one will subsequently fail with the same error message as above. But that’s for another day.