The emotional roller coaster that is programming

I skipped a few days’ worth of updates; turns out it’s a bit difficult to fit in time to write an entire post when your work schedule isn’t very consistent.

In the meantime, I finished implementing all the record and tuple opcodes in the JIT. Having done some manual testing, it was time to start running the existing test suite with the compiler enabled. Fortunately, I figured out the flag to pass in so that I wouldn’t have to add it to each test file by hand (which would have been bad practice anyway):

mach jstests Record --args=--baseline-eager

This runs all the tests with Record in the name and adds the --baseline-eager flag in to the JavaScript shell.

At this stage, failures are good — it means there’s still something interesting left to work on. Yay, a failure!

Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
REGRESSION - non262/Record/equality.js
[7|1|0|0] 100% ======================================================>|   1.1s
REGRESSIONS
    non262/Record/equality.js
FAIL
 

Narrowing down the code that caused the failure, I got:

js> Object.is(#{x: +0}, #{x: -0})
Object.is(withPosZ, withNegZ)
Hit MOZ_CRASH(Unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555583763f8 in js::jit::CallIRGenerator::tryAttachObjectIs (this=0x7fffffffcd10, callee=...)
    at /home/tjc/gecko-fork/js/src/jit/CacheIR.cpp:7745
7745            MOZ_CRASH("Unexpected type");
(gdb) 

So this told me that I hadn’t yet implemented the cases for comparing records/tuples/boxes to each other in Object.is() in the JIT.

Fixing the problem seemed straightforward. I found the CallIRGenerator::tryAttachObjectIs() method in CacheIR.cpp. The CallIRGenerator stub takes care of generating code for built-in methods as they’re called; each time a known method is called on a particular combination of operand types that’s implemented in the baseline compiler, code gets generated that will be called next time instead of either interpreting the code or re-generating it from scratch.

For example, this code snippet from tryAttachObjectIs() shows that the first time Object.is() is called with two int32 operands, the compiler will generate a version of Object.is() that’s specialized to this case and saves the need to call a more generic method and do more type checks. Of course, the generated code has to include a check that the operand types actually are int32, and either call a different generated method or generate a new stub (specialized version of the method) if not.

    MOZ_ASSERT(lhs.type() == rhs.type());
    MOZ_ASSERT(lhs.type() != JS::ValueType::Double);

    switch (lhs.type()) {
      case JS::ValueType::Int32: {
        Int32OperandId lhsIntId = writer.guardToInt32(lhsId);
        Int32OperandId rhsIntId = writer.guardToInt32(rhsId);
        writer.compareInt32Result(JSOp::StrictEq, lhsIntId, rhsIntId);
        break;
      }

The existing code handles cases where both arguments have type Int32, String, Symbol, Object, et al. So it was easy to follow that structure and add a case where both operands have a box, record, or tuple type. After a fun adventure through the MacroAssembler, I had all the pieces implemented and the test passed; I was able to apply Object.is() to records (etc.) with the baseline compiler enabled.

After that, all the tests for records passed, which isn’t too surprising since there aren’t many methods for records. Next, I tried running the tests for what’s currently called Box in the Records and Tuples proposal (subject to change), and got more failures; still a good thing.

mach-with record-tuple-with-jit jstests Box --args=--baseline-eager
[1|0|0|0]  20% ==========>                                            |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/unbox.js
[1|1|0|0]  40% =====================>                                 |   1.2s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/json.js
[1|2|0|0]  60% ================================>                      |   1.3s

Hit MOZ_CRASH(unexpected type) at /home/tjc/gecko-fork/js/src/jit/CacheIRCompiler.cpp:1930
REGRESSION - non262/Box/constructor.js
[2|3|0|0] 100% ======================================================>|   1.3s
REGRESSIONS
    non262/Box/unbox.js
    non262/Box/json.js
    non262/Box/constructor.js
FAIL

The common cause: generating code for any method calls on Boxes invokes GetPropIRGenerator::tryAttachPrimitive() (also in CacheIR.cpp as above), which didn’t have a case for records/tuples/boxes. (In JavaScript, a method is just another property on an object; so the GetProp bytecode operation extracts the property, and calling it is a separate instruction.) Similarly to the above, I added a case, and the code worked; I was able to successfully call (Box({}).unbox()) with the compiler enabled.

The next test failure, in json.js, was harder. I minimized the test case to one line, but wasn’t able to get it any simpler than this:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

This code calls the JSON.stringify() standard library method on the value Box({}) (a box wrapped around an empty object); the second argument is a function that’s applied to the value of each property in the structure before converting it to a string. The fix I made that fixed unbox.js got rid of the MOZ_CRASH(unexpected type) failure, but replaced it with a segfault.

It took me too many hours to figure out that I had made the mistake of copying/pasting code without fully understanding it. The cached method stubs rely on “guards”, which is to say, runtime type checks, to ensure that we only call a previously-generated method in the future if the types of the operands match the ones from the past (when we generated the code for this particular specialization of the method). When making the change for Object.is(), I had looked at CacheIRCompiler.cpp and noticed that the CacheIRCompiler::emitGuardToObject() method generates code that tests whether an operand is an object or not:

bool CacheIRCompiler::emitGuardToObject(ValOperandId inputId) {
  JitSpew(JitSpew_Codegen, "%s", __FUNCTION__);
  if (allocator.knownType(inputId) == JSVAL_TYPE_OBJECT) {
    return true;
  }

  ValueOperand input = allocator.useValueRegister(masm, inputId);
  FailurePath* failure;
  if (!addFailurePath(&failure)) {
    return false;
  }
  masm.branchTestObject(Assembler::NotEqual, input, failure->label());
  return true;
}

The generated code contains a “failure” label that this code branches to when the operand inputId is not an object. (It’s up to the caller to put appropriate code under the “failure” label so that this result will be handled however the caller wants.) I copied and pasted this code to create an emitGuardToExtendedPrimitive() method (“extended primitives” are what we’re calling records/tuples/boxes for now), and changed JSVAL_TYPE_OBJECT to JSVAL_TYPE_EXTENDED_PRIMITIVE so that the code would check for the “extended primitive” runtime type tag instead of the “object” type tag. The problem is that I also needed to use something else instead of branchTestObject. As it was, whenever a stub that expects a record/tuple/box as an argument was generated, it would be re-used for operands that are objects. This is obviously unsound and, looking at the failing test case again, we can see why this code exposed the bug:

JSON.stringify(Box({}), (key, value) => (typeof value === "box" ? {x: value.unbox() } : value))

The first time the (key, value) anonymous function is called, the name value is bound to Box({}). So a stub gets generated that’s a version of the typeof operation, specialized to Box things (actually anything that’s a record, tuple, or box, for implementation-specific reasons). The stub checks that the operand is a record/tuple/box, and if so, returns the appropriate type tag string (such as “box”). Except because of the bug that I introduced, this stub got re-used for any object operands. The way that the JSON stringify code works (JSON.cpp), it calls the “replacer” (i.e. the anonymous (key, value) function) on the value of each property — but then, it calls the replacer again on the replaced value. So my generated stub that worked perfectly well for Box({}) was subsequently called on {x: {}}, which has an entirely different type; hence the segfault.

Finding this bug took a long time (partly because I couldn’t figure out how to enable the “CacheIR spew” code that prints out generated CacheIR code, so I was trying to debug generated code without being able to read it…), but I experimented with commenting out various bits of code and eventually deduced that typeof was probably the problem; once I read over the code related to typeof, I spotted that my emitGuardToExtendedPrimitive() method was calling branchTestObject(). Adding a branchTestExtendedPrimitive() method to the macro assembler was easy, but tedious, since the code is architecture-specific. It would be nice if dynamic type-testing code was automatically generated, since the code that tests for tags denoting different runtime types is all basically the same. But rather than trying to automate that, I decided it was better to bite the bullet, since I already had enough cognitive load with trying to understand the compiler as it is.

It turned out that the json.js test case, despite being designed to test something else, was perfect for catching this bug, since it involved applying the same method first to a Box and then to an object. Once I’d fixed the problem with guards, this test passed. The constructor.js test still fails, but that just means I’ll have something interesting to work on tomorrow.

Perhaps the swings from despair to elation are why programming can be so habit-forming. While trying to track down the bug, I felt like I was the dullest person in the world and had hit my limit of understanding, and would never make any further progress. When I found the bug, for a moment I felt like I was on top of the world. That’s what keeps us doing it, right? (Besides the money, anyway.)

By the way, I still don’t fully understand inline caching in SpiderMonkey, so other resources, such as “An Inline Cache Isn’t Just A Cache” by Matthew Gaudet, are better sources than my posts. I mean for this blog to be more of a journal of experimentation than a definitive source of facts about anything.

Tags: ,

Leave a Reply