Archive for November, 2021

The emotional roller coaster that is programming

Friday, November 19th, 2021

I skipped a few days’ worth of updates; turns out it’s a bit difficult to fit in time to write an entire post when your work schedule isn’t very consistent.

In the meantime, I finished implementing all the record and tuple opcodes in the JIT. Having done some manual testing, it was time to start running the existing test suite with the compiler enabled. Fortunately, I figured out the flag to pass in so that I wouldn’t have to add it to each test file by hand (which would have been bad practice anyway):

mach jstests Record --args=--baseline-eager

This runs all the tests with Record in the name and adds the --baseline-eager flag in to the JavaScript shell.

At this stage, failures are good — it means there’s still something interesting left to work on. Yay, a failure!

Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
REGRESSION - non262/Record/equality.js
[7|1|0|0] 100% ======================================================>|   1.1s
REGRESSIONS
    non262/Record/equality.js
FAIL
 

Narrowing down the code that caused the failure, I got:

js> Object.is(#{x: +0}, #{x: -0})
Object.is(withPosZ, withNegZ)
Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555583763f8 in js::jit::CallIRGenerator::tryAttachObjectIs (this=0x7fffffffcd10, callee=...)
    at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
7745            MOZ_CRASH("Unexpected type");
(gdb) 

So this told me that I hadn’t yet implemented the cases for comparing records/tuples/boxes to each other in Object.is() in the JIT.

Fixing the problem seemed straightforward. I found the CallIRGenerator::tryAttachObjectIs() method in CacheIR.cpp. The CallIRGenerator stub takes care of generating code for built-in methods as they’re called; each time a known method is called on a particular combination of operand types that’s implemented in the baseline compiler, code gets generated that will be called next time instead of either interpreting the code or re-generating it from scratch.

For example, this code snippet from tryAttachObjectIs() shows that the first time Object.is() is called with two int32 operands, the compiler will generate a version of Object.is() that’s specialized to this case and saves the need to call a more generic method and do more type checks. Of course, the generated code has to include a check that the operand types actually are int32, and either call a different generated method or generate a new stub (specialized version of the method) if not.

    MOZ_ASSERT(lhs.type() == rhs.type());
    MOZ_ASSERT(lhs.type() != JS::ValueType::Double);

    switch (lhs.type()) {
      case JS::ValueType::Int32: {
        Int32OperandId lhsIntId = writer.guardToInt32(lhsId);
        Int32OperandId rhsIntId = writer.guardToInt32(rhsId);
        writer.compareInt32Result(JSOp::StrictEq, lhsIntId, rhsIntId);
        break;
      }

The existing code handles cases where both arguments have type Int32, String, Symbol, Object, et al. So it was easy to follow that structure and add a case where both operands have a box, record, or tuple type. After a fun adventure through the MacroAssembler, I had all the pieces implemented and the test passed; I was able to apply Object.is() to records (etc.) with the baseline compiler enabled.

After that, all the tests for records passed, which isn’t too surprising since there aren’t many methods for records. Next, I tried running the tests for what’s currently called Box in the Records and Tuples proposal (subject to change), and got more failures; still a good thing.

mach-with record-tuple-with-jit jstests Box --args=--baseline-eager
[1|0|0|0]  20% ==========>                                            |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/unbox.js
[1|1|0|0]  40% =====================>                                 |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/json.js
[1|2|0|0]  60% ================================>                      |   1.3s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/constructor.js
[2|3|0|0] 100% ======================================================>|   1.3s
REGRESSIONS
    non262/Box/unbox.js
    non262/Box/json.js
    non262/Box/constructor.js
FAIL

The common cause: generating code for any method calls on Boxes invokes GetPropIRGenerator::tryAttachPrimitive() (also in CacheIR.cpp as above), which didn’t have a case for records/tuples/boxes. (In JavaScript, a method is just another property on an object; so the GetProp bytecode operation extracts the property, and calling it is a separate instruction.) Similarly to the above, I added a case, and the code worked; I was able to successfully call (Box({}).unbox()) with the compiler enabled.

The next test failure, in json.js, was harder. I minimized the test case to one line, but wasn’t able to get it any simpler than this:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

This code calls the JSON.stringify() standard library method on the value Box({}) (a box wrapped around an empty object); the second argument is a function that’s applied to the value of each property in the structure before converting it to a string. The fix I made that fixed unbox.js got rid of the MOZ_CRASH(unexpected type) failure, but replaced it with a segfault.

It took me too many hours to figure out that I had made the mistake of copying/pasting code without fully understanding it. The cached method stubs rely on “guards”, which is to say, runtime type checks, to ensure that we only call a previously-generated method in the future if the types of the operands match the ones from the past (when we generated the code for this particular specialization of the method). When making the change for Object.is(), I had looked at CacheIRCompiler.cpp and noticed that the CacheIRCompiler::emitGuardToObject() method generates code that tests whether an operand is an object or not:

bool CacheIRCompiler::emitGuardToObject(ValOperandId inputId) {
  JitSpew(JitSpew_Codegen, "%s", __FUNCTION__);
  if (allocator.knownType(inputId) == JSVAL_TYPE_OBJECT) {
    return true;
  }

  ValueOperand input = allocator.useValueRegister(masm, inputId);
  FailurePath* failure;
  if (!addFailurePath(&failure)) {
    return false;
  }
  masm.branchTestObject(Assembler::NotEqual, input, failure->label());
  return true;
}

The generated code contains a “failure” label that this code branches to when the operand inputId is not an object. (It’s up to the caller to put appropriate code under the “failure” label so that this result will be handled however the caller wants.) I copied and pasted this code to create an emitGuardToExtendedPrimitive() method (“extended primitives” are what we’re calling records/tuples/boxes for now), and changed JSVAL_TYPE_OBJECT to JSVAL_TYPE_EXTENDED_PRIMITIVE so that the code would check for the “extended primitive” runtime type tag instead of the “object” type tag. The problem is that I also needed to use something else instead of branchTestObject. As it was, whenever a stub that expects a record/tuple/box as an argument was generated, it would be re-used for operands that are objects. This is obviously unsound and, looking at the failing test case again, we can see why this code exposed the bug:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

The first time the (key, value) anonymous function is called, the name value is bound to Box({}). So a stub gets generated that’s a version of the typeof operation, specialized to Box things (actually anything that’s a record, tuple, or box, for implementation-specific reasons). The stub checks that the operand is a record/tuple/box, and if so, returns the appropriate type tag string (such as “box”). Except because of the bug that I introduced, this stub got re-used for any object operands. The way that the JSON stringify code works (JSON.cpp), it calls the “replacer” (i.e. the anonymous (key, value) function) on the value of each property — but then, it calls the replacer again on the replaced value. So my generated stub that worked perfectly well for Box({}) was subsequently called on {x: {}}, which has an entirely different type; hence the segfault.

Finding this bug took a long time (partly because I couldn’t figure out how to enable the “CacheIR spew” code that prints out generated CacheIR code, so I was trying to debug generated code without being able to read it…), but I experimented with commenting out various bits of code and eventually deduced that typeof was probably the problem; once I read over the code related to typeof, I spotted that my emitGuardToExtendedPrimitive() method was calling branchTestObject(). Adding a branchTestExtendedPrimitive() method to the macro assembler was easy, but tedious, since the code is architecture-specific. It would be nice if dynamic type-testing code was automatically generated, since the code that tests for tags denoting different runtime types is all basically the same. But rather than trying to automate that, I decided it was better to bite the bullet, since I already had enough cognitive load with trying to understand the compiler as it is.

It turned out that the json.js test case, despite being designed to test something else, was perfect for catching this bug, since it involved applying the same method first to a Box and then to an object. Once I’d fixed the problem with guards, this test passed. The constructor.js test still fails, but that just means I’ll have something interesting to work on tomorrow.

Perhaps the swings from despair to elation are why programming can be so habit-forming. While trying to track down the bug, I felt like I was the dullest person in the world and had hit my limit of understanding, and would never make any further progress. When I found the bug, for a moment I felt like I was on top of the world. That’s what keeps us doing it, right? (Besides the money, anyway.)

By the way, I still don’t fully understand inline caching in SpiderMonkey, so other resources, such as “An Inline Cache Isn’t Just A Cache” by Matthew Gaudet, are better sources than my posts. I mean for this blog to be more of a journal of experimentation than a definitive source of facts about anything.

Adventures in gdb

Wednesday, November 10th, 2021

I picked up from yesterday wanting to see what code was being generated for record initialization. A colleague pointed me to a page of SpiderMonkey debugging tips. This was helpful, but required being able to run the JS interpreter inside GDB and type some code into the REPL. The problem is that before it got to that point, the interpreter was trying to compile all the self-hosted code; I knew that this wasn’t going to succeed since I’ve only implemented one of the record/tuple opcodes. I wanted to be able to just do:

> x = #{}

(binding the variable x to an empty record literal) and see the generated code. But because the much-more-complicated self-hosted code has to get compiled first, I never get to that point.

Another colleague suggested looking at the IONFLAGS environment variable. This, in turn, seems to only have an effect if you build the compiler with the --enable-jitspew option. Once I did that, I was able to find out more:

$ IONFLAGS=zzzz mach run
obj-x64-debug/dist/bin/js
found tag: zzzz
Unknown flag.

usage: IONFLAGS=option,option,option,... where options can be:

  aborts        Compilation abort messages
  scripts       Compiled scripts
  mir           MIR information
    ...
    

And so on.

I found that IONFLAGS=codegen mach run would cause the interpreter to print out all the generated assembly code, including all the code for self-hosted methods. This wasn’t entirely helpful, since it was hard to see where the boundaries were between different methods.

I decided to try a different strategy and see what I could do inside gdb. I’ve avoided using debuggers as much as possible throughout my programming career. I’m a fan of printf-style debugging. So much so that I created the printf-style debugging page on Facebook. (This made more sense back when Facebook pages were “fan pages”, so you could be a “fan of” printf-style debugging.) I’ve always had the feeling that any more sophisticated debugging technology wasn’t worth the difficulty of use. Working on a compiler implemented in C++, though, it seems I’m finally having to suck it up and learn.

The first question was how to set a breakpoint on a templated function. I found the rbreak command in gdb, which takes a regular expression. I realized I could also just do:

(gdb) info functions .*emit_InitR.*
All functions matching regular expression ".*emit_InitR.*":

File js/src/jit/BaselineCodeGen.cpp:
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();
2590:   bool js::jit::BaselineCodeGen::emit_InitRecord();

File js/src/jit/BaselineIC.cpp:
2454:   bool js::jit::FallbackICCodeCompiler::emit_InitRecord();
(gdb)

So I set a breakpoint on the method I wrote to generate code for the InitRecord opcode:

(gdb) b js::jit::BaselineCodeGen::emit_InitRecord
Breakpoint 1 at 0x555558093884: file /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp, line 2591.
(gdb) b js::jit::FallbackICCodeCompiler::emit_InitRecord
Breakpoint 2 at 0x5555580807b1: file /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp, line 2455.
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/tjc/gecko-fork/obj-x64-debug/dist/bin/js 
[snip]

Thread 1 "js" hit Breakpoint 2, js::jit::FallbackICCodeCompiler::emit_InitRecord (this=0x7fffffffd1b0)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:2455
2455      EmitRestoreTailCallReg(masm);
(gdb) 

Finally! At this point, I was hoping to be able to view the code that was being generated for the empty record literal. Stepping through the code from here gave me what I was looking for:

(gdb) s
js::jit::FallbackICCodeCompiler::tailCallVMInternal (
    this=0x7fffffffd1b0, masm=..., 
    id=js::jit::TailCallVMFunctionId::DoInitRecordFallback)
    at /home/tjc/gecko-fork/js/src/jit/BaselineIC.cpp:510
510   TrampolinePtr code = cx->runtime()->jitRuntime()->getVMWrapper(id);
(gdb) n
511   const VMFunctionData& fun = GetVMFunction(id);
(gdb) n
512   MOZ_ASSERT(fun.expectTailCall == TailCall);
(gdb) n
513   uint32_t argSize = fun.explicitStackSlots() * sizeof(void*);
(gdb) n
514   EmitBaselineTailCallVM(code, masm, argSize);
(gdb) n
515   return true;
(gdb) p code
$18 = {value = 0x1e4412b875e0 "H\277"}
(gdb) p code.value
$19 = (uint8_t *) 0x1e4412b875e0 "H\277"
(gdb) x/64i code.value
   0x1e4412b875e0:  movabs $0x7ffff4219000,%rdi
   0x1e4412b875ea:  mov    0x1c0(%rdi),%rax
   0x1e4412b875f1:  mov    %rsp,0x70(%rax)
   0x1e4412b875f5:  movabs $0x55555903de60,%r11
   0x1e4412b875ff:  push   %r11
   0x1e4412b87601:  lea    0x18(%rsp),%r10
   0x1e4412b87606:  movabs $0xfff9800000000000,%r11
     

So that’s the generated code for DoInitRecordFallback (the fallback method implemented in the inline cache module of the baseline compiler), but I realized this wasn’t really what I was hoping to find. I wanted to see the intermediate representation first.

From there, I realized I was barking up the wrong tree, since the baseline compiler just goes straight from JS to assembly; only the more sophisticated compilers (which weren’t being invoked at this point) use MIR and LIR. (A blog post from Matthew Gaudet, “A Beginners Guide To SpiderMonkey’s MacroAssembler”], explains some of the pipeline.)

So at least I knew one way to get to the generated assembly code for one opcode, but it wasn’t particularly helpful. My co-worker suggested putting in no-op implementations for the other opcodes so that it would be able to compile all the self-hosted code (even if the generated code wouldn’t work). This seemed like the fastest way to get to a functioning REPL so I could experiment with simpler code snippets, and it worked. After just adding a no-op emit_ method in BaselineCodeGen.cpp for each opcode, the interpreter was able to start up.

When I typed code into the REPL, I could tell it was only being interpreted, not compiled, since everything still worked, and I would expect anything that used records/tuples except for an empty record literal to fail. I found the --baseline-eager flag with a little bit of digging, and:

obj-x64-debug/dist/bin/js --baseline-eager
js> function f() { return #{}; }
function f() { return #{}; }
js> f()
f()
Assertion failure: !BytecodeOpHasIC(op) (Missing entry in OpToFallbackKindTable for JOF_IC op), at js/src/jit/BaselineIC.cpp:353
Segmentation fault
$

Excellent! This pointed to something I didn’t change yesterday (since the compiler didn’t make me) — I had to update the OpToFallbackKindTable in BaselineIC.cpp.

Once I did that, I realized that I couldn’t get very far with just InitRecord, since I wouldn’t expect even the empty record to compile without being able to compile the FinishRecord opcode. (Since records are immutable, Nicolò’s implementation adds three opcodes for creating records: one to initialize the empty record, one to add a new record field, and one to finish initialization, the last of which marks the record as immutable so that no more fields can be added.)

So I implemented FinishRecord, similarly to the work from yesterday. Now what? I was able to type in an empty record literal without errors:

> x = #{}
#{}

But how do I know that x is bound to a well-formed record that satisfies its interface? There’s not too much you can do with an empty record. I decided to check that typeof(x) worked (it should return “record”), and got an assertion failure in the emitGuardNonDoubleType() method in CacheIRCompiler.cpp). It took me some time to make sense of various calls through generated code, but the issue was the TypeOfIRGenerator::tryAttachStub() method in CacheIR.cpp:

AttachDecision TypeOfIRGenerator::tryAttachStub() {
[...snip...]
  TRY_ATTACH(tryAttachPrimitive(valId));
  TRY_ATTACH(tryAttachObject(valId));

  MOZ_ASSERT_UNREACHABLE("Failed to attach TypeOf");
  return AttachDecision::NoAction;
    }
    

This code decides, based on the type of the operand (valId) whether to use the typeOf code for primitives or for objects. The record/tuple implementation adds “object primitives”, which share some qualities with objects but aren’t objects (since, among other things, objects are mutable). The tryAttachPrimitive() call was successfully selecting the typeOf code for primitives, since the isPrimitive() method on the Value type returns true for object primitives. Because there was no explicit case in the code for records, the code for double values was getting called as a fallback and that’s where the assertion failure was coming from. Tracking this down took much more time than actually implementing typeOf for records, which I proceeded to do. And now I can get the type of a record-valued variable in compiled code:

js> x = #{}
    #{}
js> typeof(x)
"record"

This provides at least some evidence that the code I’m generating is laying out records properly. Next up, I’ll try implementing the opcode that adds record properties, so that I can test out non-empty records!

Adding record and tuple support to the JIT

Tuesday, November 9th, 2021

Today I started working on implementing the Record and Tuples proposal for JavaScript in the JIT in SpiderMonkey. All of this work is building on code written by Nicolò Ribaudo, which isn’t merged into SpiderMonkey yet but can be seen in patches linked from the Bugzilla bug.

Up until now, SpiderMonkey would automatically disable the JIT if you built it with the compile-time flag that enables records and tuples. Currently, the interpreter implements records and tuples, but not the compiler. I started by searching through the code to figure out how to re-enable the JIT, but realized it would be faster to look through the commit history, and found it in js/moz.configure. (If you try to follow along, you won’t be able to see some of the code I’m referring to since it’s in unapplied patches, but I’m including some links anyway to give context.)

I saw that if I just pass in the --enable-jit build flag explicitly, it should override what the config file said, and it indeed did. I decided to operate on the assumption that the compiler error messages would tell me what I needed to implement, which isn’t always a safe assumption when working in C/C++, but seems to have served me okay in my SpiderMonkey work so far.

The first set of compiler errors I got had to do with adding the IsTuple() built-in method to the LIR. (The MIR and LIR, two of the intermediate languages used in SpiderMonkey, are explained briefly on the SpiderMonkey documentation page.) This involved implementing EmitObjectIsTuple() and visitIsTuple methods in CodeGenerator.cpp, part of the baseline compiler (the documentation also explains the various compilers that make up the JIT). That was straightforward, since IsTuple() is just a predicate that returns true for tuple arguments and false for arguments of any other type. When I implemented this method before, I chose to implement it as a JS_INLINABLE_FN, not knowing what I was getting myself into. With JIT disabled at compile time, the compiler made me implement it down to the MIR level, but now I had to implement it in LIR.

Once that was done, I ran the interpreter and got an assertion failure: "Hit MOZ_CRASH(Record and Tuple are not supported by jit) at gecko-fork/js/src/jit/BaselineCodeGen.cpp:2589". This was excellent, since it told me exactly where to start. When I looked at BaselineCodeGen.cpp, I saw that the seven opcodes for records and tuples were all defined with the UNSUPPORTED_OPCODE macro, so I planned to proceed by removing each of the UNSUPPORTED_OPCODE calls one-by-one and seeing what that forced me to implement.

I started with the InitRecord opcode, which as you might guess, creates a new record with a specified number of fields. As a strategy, I followed the pattern for the existing NewArray and NewObject opcodes, since creating new arrays and objects is similar to creating new records.

By following the error messages, I found the files that I needed to change; I’m putting this list in logical order rather than in the order that the compile errors came up, which was quite different.

  • VMFunctionList-inl.h — added the RecordType::createUninitialized C++ function to the list of functions that can be called from the JIT
  • VMFunctions.h — added a TypeToDataType case for the RecordType C++ type
  • BaselineCodeGen.cpp, where I added an emit method for InitRecord
  • BaselineIC.cpp, and CacheIR.cpp, where I added code to support inline caching (explained here) for InitRecord.
  • MIROps.yaml, the file that defines all MIR opcodes; a lot of other code is automatically generated from this file. I had to add a new InitRecord opcode.
  • MIR.cpp — MInitRecord methods
  • MIR.h, where I had to define a new MInitRecord class, and MIR.cpp, where I had to implement the class.
  • Lowering.cpp, where I added code to translate the MIR representation for an InitRecord call to LIR.
  • LIROps.yaml, similarly to MIROps.yaml.
  • CodeGenerator.cpp, where I added the visitInitRecord method that translates the LIR code to assembly.
  • Recover.cpp — while I don’t understand this code very well, I think it’s what implements the “bailout” mechanism described in the docs. Similarly to the other modules, I had to add methods for InitRecord and a new class to the accompanying header file.

I love compiler errors! Without static typechecking, I wouldn’t have any information about what parts of the code I needed to change to add a new feature. As a functional programmer, I normally don’t give C++ a lot of credit for static typechecking, but whether it’s about modern language features or the coding style used in SpiderMonkey (or both), I actually find that I get a lot of helpful information from type error messages when working on SpiderMonkey. Without static type errors, I would have had to understand the JIT from the top down to know what parts I needed to change, maybe by reading through the code (slow and tedious) or maybe by reading through documentation (likely to be out of date). Types are documentation that can’t fall out of date, since the compiler won’t generate code for you if you give it something that doesn’t typecheck.

Once everything compiled and I started the interpreter again, I got a different assertion failure:

"Assertion failure: BytecodeOpHasIC(op), at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:649"

This pointed to the final change, in BytecodeLocation.h. I had added the code for inline caching, but hadn’t updated the opcode table defined in this file to indicate that the InitRecord opcode had an inline cache. Since the relationship between this table and the code itself exists only in the programmers’ heads, there’s no way for the compiler to check this for us.

Once I fixed this and started the interpreter again, I got a new error:

Hit MOZ_CRASH(Record and Tuple are not supported by jit) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604
Thread 1 "js" received signal SIGSEGV, Segmentation fault. 0x000055555809ce62 in js::jit::BaselineCodeGen::emit_AddRecordProperty (this=0x7fffffffd080) at /home/tjc/gecko-fork/js/src/jit/BaselineCodeGen.cpp:2604 2604 UNSUPPORTED_OPCODE(AddRecordProperty)

This is just saying that AddRecordProperty is an unsupported opcode, which is what I would expect since I only implemented one of the record/tuple opcodes. So that means that after my changes, SpiderMonkey was able to generate code for the InitRecord opcode. (The reason why these errors showed up as soon as I launched the interpreter, without having to execute any code, is that at startup time with JIT enabled, the interpreter compiles all the self-hosted libraries, which are implemented in JavaScript. Since on my working branch, there is library code that uses the Record and Tuple types, that means that the code path leading to those UNSUPPORTED_OPCODES was guaranteed to be reached.)

So what do I know now? The JIT seems to be able to generate code for the InitRecord opcode, at least for the first occurrence of it in the self-hosted libraries. Whether that code works (that is, implements the semantics in the spec) is a separate question. To know the answer, I would have to look at the generated code — I won’t be able to actually test any code in the interpreter until I implement all the opcodes, since each one will subsequently fail with the same error message as above. But that’s for another day.

Hello, world!

Monday, November 8th, 2021
It’s been a long time since I’ve blogged regularly, especially about software. When I worked on the Rust team, I wrote an update post at the end of every single day about what I’d worked on that day, every day I possibly could. I’m going to try to do that again I joined the Compilers team at Igalia this past September and am currently working on implementing new JavaScript features in the Spidermonkey JavaScript engine; at the moment, the Records and Tuples proposal, which would add immutable data types to JavaScript. As much as possible, I’m going to document how I spend each work day and what problems arise. This is mostly for me (so that I don’t look back and wonder what I did all month), but if anyone else happens to find it interesting, that’s an added bonus.