Archive for December, 2021

Fun with pointer arithmetic

Thursday, December 9th, 2021

a diagram showing the layout of the NativeObject and RecordType typesA picture is worth a thousand words, they say.

Where I left off yesterday, I was trying to figure out why my generated AddRecordProperty code was crashing. I was still using the simplest possible test case, a record with one field:

function f() { x = #{"a": 1}; }

Fixed slots

My code was writing a literal zero into the record’s “initialized length” slot:

store32(Imm32(0), Address(result, NativeObject::getFixedSlotOffset(RecordType::INITIALIZED_LENGTH_SLOT)));

But this should have been:

  storeValue(Int32Value(0), Address(result, NativeObject::getFixedSlotOffset(RecordType::INITIALIZED_LENGTH_SLOT)));

In the drawing, I indicated that offset 24 of a RecordType is a Value (JS::Value) denoting the number of elements that have been initialized so far. While it’s an invariant that this will actually be an integer value, as far as the compiler is concerned, the representation is of a Value, which has a different bit pattern from the integer 0.

Some existing code in RecordType::createUninitializedRecord() (this is code that isn’t upstream yet) should have been a clue:

  uint32_t length = getFixedSlot(INITIALIZED_LENGTH_SLOT).toPrivateUint32();

To get an unsigned int32, we call the Value method toPrivateUint32(), which returns an integer when called on an integer value.

Moreover, the getFixedSlot() method of NativeObject also returns a Value, which should have been a pretty good hint to me that fixed slots are Values:

const Value& getFixedSlot(uint32_t slot) const;

(NativeObject.h)

Observing the length field

Supposing that the register %rcx points to a record, I would like to be able to execute:

call js::DumpValue(((RecordType*) $rcx)->getFixedSlot(INITIALIZED_LENGTH_SLOT))

in gdb. (Where INITIALIZED_LENGTH_SLOT is defined as 0, since it happens to be the first fixed slot in this object.) Casting the value in %rcx to RecordType is necessary to tell gdb where the struct fields begin and end, but from there, I would have thought there would be enough debug information for it to know that RecordType inherits from NativeObject, which has a getFixedSlot() method.

Since I can’t do that, the next best thing is:

(gdb) call js::DumpValue( (JS::Value::fromRawBits (*($rcx + 24)) ))
0

And that works — it prints 0, which is what I would expect for a record with no initialized fields. Effectively, I inlined getFixedSlot(), which accesses offset 24 from the object. Then, JS::Value::fromRawBits decodes the tagged pointer that represents a Value, and DumpValue() pretty-prints it.

Observing the sortedKeys field

Looking at the picture again, records have a second fixed slot that’s a Value that is guaranteed (assuming the compiler works) to correspond to an ArrayObject, which just contains the record keys, in sorted order. I knew that my code was temporarily storing the value of this slot in register %rbx (as before, I figured this out by putting breakpoints in the code generation methods and looking at the values of various variables), so if I do:

call js::DumpValue ((JS::Value::fromRawBits($rbx)))

in gdb, I get output that’s something like <Array object at 232d468007f8>

But for more detail, I can do:

(gdb) call js::DumpObject (& ((JS::Value::fromRawBits($rbx)).toObject()))
object 29574ae007f0
  global 21bd66c40030 [global]
  class 5555590a8770 Array
  shape 21bd66c66320
  flags:
  proto 
  properties:
    [Latin 1]"length" (map 21bd66c62670/0 writable )

which interprets the Value as an Object and uses DumpObject to print out more details about its representation.

Observing the record itself

Having seen that the individual fixed slots of the record seemed to be correct, I wanted to debug my generated code for creating uninitialized records to see what the entire record object looked like. Knowing that the record was stored in %rcx, I figured out that I could do:

(gdb) call js::DumpObject (& ((JS::Value::fromRawBits($rcx)).toExtendedPrimitive()))
object 29574ae007b0
  global 21bd66c40030 [global]
  class 5555590b7d10 record
  shape 21bd66c663c0
  flags:
  proto null
  reserved slots:
      0 : 0
      1 : 
      2 : false
  properties:
  elements:
      0: false
      1: Assertion failure: (asBits_ & js::gc::CellAlignMask) == 0 (GC pointer is not aligned. Is this memory corruption?), at /home/tjc/gecko-fork/obj-x64-debug/dist/include/js/Value.h:622

“Extended primitive” is a provisional name for records and tuples, which are not objects, but are (in our prototype implementation) represented internally in the compiler as objects; that’s why I’m able to use DumpObject to print out the fields. Under “reserved slots”, it’s showing me the values of the three reserved slots shown at offsets, 24, 32, and 40 in the picture above.

Obviously, it’s a warning sign that trying to print out the elements array causes an assertion failure. I would like to be able to print it out using:

call js::DumpValue(((RecordType*) $rcx)->getElements()[0])

since getElements() is a NativeObject method that represents an array of Values. But this doesn’t work in gdb, so knowing that the offset of the elements_ field is 16, I did the following:

(gdb) p *(ObjectElements::fromElements((HeapSlot*) ($rcx + 16)))
$10 = { flags = 3952501696,  initializedLength = 13525, capacity = 1437286072, length = 21845}
    (gdb) 

I deleted some of the output so as to show only the dynamic fields. The picture above makes this much clearer, but once we access the HeapSlot array stored in the elements_ field, we can call the ObjectElements::fromElements method on it to access the elements header, which is stored physically before the array itself. This header consists of four int32 fields: flags, initializedLength, capacity, and length. This output makes it seem like garbage got written into the object, since the initialized length shouldn’t be 13525.

Looking at the slots_ array of the record (which is at offset 8), I observed:

p ((HeapSlot*) (*($rcx + 8)))
$22 = (js::HeapSlot *) 0x55ab3eb8
(gdb) 

This coresponds to this MacroAssembler call:

  storePtr(ImmPtr(emptyObjectSlots), Address(result, NativeObject::offsetOfSlots()));

result here represents the register that contains the record, and offsetOfSlots() is 8. I had copied/pasted this code from existing code that initializes arrays, without looking at it too carefully, but when I read it again, I noticed that emptyObjectSlots is a special value. A comment in NativeObject.h mentions that special singleton values are used when the elements_ and slots_ arrays are empty.

Initializing the elements

And that’s how I realized that some other code that I had copied and pasted out of the existing array initialization code was misplaced:

  // Initialize elements pointer for fixed (inline) elements.
  computeEffectiveAddress(
      Address(result, NativeObject::offsetOfFixedElements()), temp);
  storePtr(temp, Address(result, NativeObject::offsetOfElements()));

This works fine for what it does, but I needed to do the equivalent of the NativeObject::setEmptyElements() method:

Should do exactly what NativeObject::setEmptyElements() does:

  void setEmptyElements() { elements_ = emptyObjectElements; }

(code)

So what I really wanted was:

  // Initialize elements pointer
  storePtr(ImmPtr(emptyObjectElements),
           Address(result, NativeObject::offsetOfElements()));
                     

And after making that change, the elements header looked a lot better:

(gdb) p (*((ObjectElements*) ($rcx)))
$8 = {flags = 0, initializedLength = 0, capacity = 1, length = 1, static VALUES_PER_HEADER = 2}
(gdb)  call ((JSObject*) $rcx)->dump()
object 17d67b3007b0
  global 2689ff340030 [global]
  class 5555590b7b80 record
  shape 2689ff3663c0
  flags:
  proto null
  reserved slots:
      0 : 0
      1 : 
      2 : false
  properties:
    (gdb) 

There’s more to recount, but I don’t have any more time today.

Observability

Wednesday, December 8th, 2021

I’m realizing that this blog is mostly about debugging, since figuring out how to debug SpiderMonkey is taking up most of my time. Right now, the actual compilation algorithms I’m implementing are straightforward; if I was working on performance tuning an existing implementation or something, I’m sure it would be different. What’s actually hard is finding ways to make generated code more observable. I’m sure that more experienced SpiderMonkey hackers have their own techniques, but since there isn’t a lot of documentation to go on, I’m making it up mostly from scratch.

Having gotten empty records to work, I started working on code generation for the AddRecordProperty opcode, so we can have non-empty records. When I initially tested my code with input x = #{"a": 1}, I got an assertion failure:

Assertion failure: !this->has(reg), at /home/tjc/gecko-fork/js/src/jit/RegisterSets.h:680

Thread 1 "js" received signal SIGSEGV, Segmentation fault.
0x00005555580486c9 in js::jit::SpecializedRegSet<js::jit::LiveSetAccessors<js::jit::TypedRegisterSet >, js::jit::TypedRegisterSet >::add (
    this=0x7fffffffc768, reg=...) at /home/tjc/gecko-fork/js/src/jit/RegisterSets.h:680
680     MOZ_ASSERT(!this->has(reg));

AddRecordProperty takes three operands (the record, the field name, and the value for the field). When I dug into it, it turned out that two of the three operands had been allocated to the same register, which wasn’t right. The CacheIRCompiler module handles register allocation, spilling registers to the stack when necessary, so you can implement new operations without having to worry about those details. But in this case, I had to worry about that detail. Internally, the register allocator keeps a list, operandLocations_ that maps integers (operands are numbered from 0 onward) to physical registers. It turned out that operands 0 and 2 were both allocated to %rax, which I figured out in gdb (irrelevant details snipped):

(gdb) p operandLocations_[0]
$13 = (js::jit::OperandLocation &) @0x7fffffffc680: {
  kind_ = js::jit::OperandLocation::PayloadReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rax, 
                ...
(gdb) p operandLocations_[1]
$14 = (js::jit::OperandLocation &) @0x7fffffffc690: {
  kind_ = js::jit::OperandLocation::ValueReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rbx, 
                ...
(gdb) p operandLocations_[2]
$15 = (js::jit::OperandLocation &) @0x7fffffffc6a0: {
  kind_ = js::jit::OperandLocation::ValueReg, data_ = {payloadReg = {reg = {
        reg_ = js::jit::X86Encoding::rax,  

The BaselineCacheIRCompiler::init() method (code) is called each time a new opcode is compiled, and it assigns a virtual register for each operand. I compared the code I’d added for AddRecordProperty with the code for the other opcodes that also have 3 operands (such as SetElem), I noticed that all of them passed the third operand on the stack. When I changed AddRecordProperty to work the same way, the problem was fixed. I don’t know how I would have figured this out except for a lucky guess! I’m guessing it’s this way because of the small number of registers on x86, but the internal error-checking here seems lacking; it doesn’t seem good to silently alias distinct operands to each othre.

Once I changed the init() method and modified my emitAddRecordPropertyResult() method to expect the third operand on the stack, I found that for the compiled code to be run and not just emitted, I had to embed the code in a function once again:
function f() { x = #{"a":1}; }

(I still don’t understand the details as to when the JIT decides to generate native code.)

The result was a segfault, which was okay for the time being; it means that my code was at least able to generate code for setting a record property, without internal errors.

I realized that I wasn’t using the protocol for calling from the JIT into C++ functions (the callVM() method in the baseline compiler) correctly, but took a break before fixing it.

After taking a break, I fixed that problem only to run into another one: callVM() doesn’t automatically save registers. The macro assembler provides another method, callWithABI() (code), that makes it easier to save and restore registers. Evidently, ABI calls are less efficient than VM calls, but I’m not worried about performance at the moment. I switched it around to use callWithABI() and had to do some debugging by setting breakpoints both in the C++ code for the callee function and in the generated assembly code to make sure all the data types matched up. That worked reasonably well, but alas, I had to switch back to callVM() because I couldn’t figure out how to pass a HandleObject (see the comment at the beginning of RootingAPI.h for a fairly good explanation) into an ABI function. I added code to manually save and restore registers.

That worked okay enough, and my generated code got almost all the way through without a segfault. At the very end, my code calls the existing emitStoreDenseElement method. Internally (in the current implementation; this is all subject to change), records contain a sorted array of keys. Separately from storing the actual value at the correct offset in the record, the key has to be added to the array. (Since records are immutable, the actual sorting only has to happen once, when the FinishRecord opcode executes.)

I was getting a segfault because there was something wrong about the array that I was passing into the generated code for storing dense elements. I haven’t found the problem yet, but (going back to the beginning of this post), I’m trying to find ways to make the generated code more transparent. In this case, I knew that the emitStoreDenseElement code was getting its array input in the %rsi register, so I put a breakpoint in the right place and looked at it in gdb:


Thread 1 "js" received signal SIGTRAP, Trace/breakpoint trap.
0x00001dec1e58ab70 in ?? ()

(gdb) p *((JSObject*) $rsi)
$3 = {<js::gc::CellWithTenuredGCPointer> = { = {
      header_ = {<mozilla::detail::AtomicBaseIncDec> = {<mozilla::detail::AtomicBase> = {
            mValue = {<std::__atomic_base> = {static _S_alignment = 8, _M_i = 6046507808}, 
              static is_always_lock_free = true}}, }, }, static RESERVED_MASK = 7, static FORWARD_BIT = 1}, }, 
  static TraceKind = JS::TraceKind::Object, static MAX_BYTE_SIZE = 152}
(gdb) p *((JSObject*) $rsi).getHeader()
Attempt to take address of value not located in memory.
(gdb) p ((JSObject*) $rsi)->header_
$4 = {<mozilla::detail::AtomicBaseIncDec> = {<mozilla::detail::AtomicBase> = {
      mValue = {<std::__atomic_base> = {static _S_alignment = 8, _M_i = 6046507808}, 
        static is_always_lock_free = true}}, }, }
(gdb) p ((JSObject*) $rsi)->slots_
There is no member or method named slots_.
(gdb) p ((js::HeapSlot*) ($rsi + 8))
$5 = (js::HeapSlot *) 0x3a6c5de007f8
(gdb) p ((js::HeapSlot*) ($rsi + 8))[0]
$6 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 93824997702264}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 8))[1]
$7 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 64237105842200}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 8))[2]
$8 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 0}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 16))[0]
$9 = {<js::WriteBarriered> = {<js::BarrieredBase> = {value = {
        asBits_ = 64237105842200}}, <js::WrappedPtrOperations<JS::Value, js::WriteBarriered, void>> = {}, }, }
(gdb) p ((js::HeapSlot*) ($rsi + 16))[0].value
$10 = {asBits_ = 64237105842200}
Slot*) ($rsi + 16))))
No symbol "ObjectElements" in current context.
(gdb) p (js::ObjectElements::fromElements (((js::HeapSlot*) ($rsi + 16))))
$11 = (js::ObjectElements *) 0x3a6c5de007f0
(gdb) p (js::ObjectElements::fromElements (((js::HeapSlot*) ($rsi + 16))))->initializedLength
$12 = 1
(gdb) 

I’m posting this debugging transcript to show some of the things I’ve discovered so far. In gdb, you can cast a pointer to an arbitrary type and use that to print out the contents according to the physical layout of that type; hence, printing ((JSObject*) $rsi. Weirdly, the NativeObject class (code) seems to be inaccessible in gdb; I’m not sure if this is because NativeObject inherits from JSObject (in this case, I know that the object pointed to by %rsi is a NativeObject, or because of something different about that class. (None of NativeObject, js::NativeObject, or JS::NativeObject seem to be names that gdb recognizes.) That makes it harder to print some of the other fields, but the debugger does know about the header_ field, since it’s defined in the Cell class (code) that JSObject inherits from. The getHeader() method is just supposed to return the header_ field, so I’m not sure why gdb reported the “Attempt to take address of value not located in memory” error.

I knew that NativeObjects have a slots_ field and an elements_ field, physically laid out right after the header_ field, but due to the problem I mentioned with the NativeObject class, gdb won’t recognize these field names. I knew that slots_ would be at offset 8 and elements would be at offset 16, so using some pointer math and casting the pointers to the type that both slots_ and elements_ have (js::HeapSlot*; the HeapSlot class is defined here), I was able to make gdb treat these addresses as HeapSlot arrays. HeapSlot is basically an alias for the Value type, which represents all runtime values in SpiderMonkey (see Value.h).

The slots_ array contains “fixed slots” (attributes, basically); this object’s slots array has two elements that are initialized (the one at index 2 is printed as if its asBits_ field is 0, meaning it’s just a 0 and is probably uninitialized). For array-like objects (which this one is — if you’ve lost track, it’s the array of sorted record keys that’s part of a record), the elements_ array is an array of, well, elements. NativeObject has a getElements method, which casts the HeapSlot array into an ObjectElements object (which has some additional methods on it), but I can’t call any NativeObject methods from gdb, so instead I called the ObjectElements::fromElements method directly to cast it. Since the initializedLength() method on the resulting object returns 1, which is correct (the previous generated code sets the initialized length to 1, since we’re adding a property to a previously empty record), it seems like the layout is vaguely correct.

That’s as far as I got. Debugging tools for generated code that don’t require all these manual casts and have more access to type information that would make pretty-printing easier would be great! Maybe that exists and I haven’t found it, or maybe it’s just that compiler implementors are a small enough group that making tools more usable isn’t too compelling for anyone. In any case, the hard part about all of this is finding good ways to observe data at runtime and relate it to types that exist in the compiler.

Further adventures in gdb

Tuesday, December 7th, 2021

So, that updating-every-day plan hasn’t exactly worked out. Working on the code is so much fun that it’s hard to stop, and then I have to choose between going to bed and writing a blog post.

I’m currently working on generating code for the three opcodes related to records: AddRecord, FinishRecord, and AddRecordProperty. The work I wrote about in previous posts wasn’t exactly complete, because it allowed code generation in the presence of records, but I just plugged in no-op implementations for the tryAttachStub() methods in CacheIR.cpp that actually do the work. I still had to change some code generation methods so that existing code could work with records and tuples, but the really fun stuff is what I’m doing now!

Rather than try to cover everything, I’ll summarize my notes from the end of last week. I had added two methods to BaselineCacheIRCompiler.cpp, which is the part of the baseline compiler that generates baseline code from CacheIR code. As an initial test case, I was using a function that just returns an empty record literal:

function f() { x = #{}; }

since I hadn’t yet implemented code generation for adding record properties. The JavaScript bytecode for this function looks like:

loc     op
-----   --
main:
00000:  BindGName "x"                   # GLOBAL
00005:  InitRecord 0                    # GLOBAL 
00010:  FinishRecord                    # GLOBAL 
00011:  SetGName "x"                    # 
00016:  Pop                             # 
00017:  RetRval                         # 

The InitRecord 0 instruction creates an uninitialized record of length 0, and the FinishRecord instruction sets internal flags to make the record read-only. In a non-empty record, one or more AddRecordProperty instructions would appear in between; it would be a dynamic error to invoke another AddRecordProperty instruction again after the FinishRecord.

The bytecode above is from executing dis(f) in the interpreter; I also discovered that you can do disnative(f) after CacheIR and then native code are generated for f. This prints out the generated assembly code; it’s actually much easier to read if you execute print(disnative(f)). Invoking the interpreter with the environment variable IONFLAGS=codegen also prints out the code as it’s generated, which is sometimes more useful since it can be interleaved with other debug messages that show which methods are generating which bits of code.

I was getting a segfault, meaning that the code being generated by my emitNewRecordResult() method didn’t match the expectations that the code generated by my emitFinishRecordResult() method had about the data. So I fired up gdb, as I had done before. Trying to zero in on the problem, I was looking at some generated code that took its input in the %rax register (gdb uses the $rax syntax instead.) With careful use of casts and calling accessor methods, it’s possible to look at the C++ representations of the objects that generated code creates. I found it helpful to look at the value of this expression:

(gdb) p (*(*((JSObject*) $rax)).shape()).getObjectClass()
$9 = (const JSClass *) 0x5555590a6130 

From context, I was able to figure out that 0x5555590a6130 was the address of the JSClass structure for the ArrayObject type. This wasn’t what I expected, since the input for this code was supposed to be a record (with type RecordType in C++).

Since this was a few days ago, I’ve lost track of the logical steps in between, but eventually, I put a breakpoint in the createUninitializedRecord() method that I’d added to MacroAssembler.cpp; shape and arrayShape are variable names referring to two registers I allocated within that method.

Thread 1 "js" hit Breakpoint 1, js::jit::MacroAssembler::createUninitializedRecord (
    this=0x7fffffffae00, result=..., shape=..., arrayShape=..., temp=..., temp2=..., 
    recordLength=0, allocKind=js::gc::AllocKind::OBJECT4_BACKGROUND, 
    initialHeap=js::gc::DefaultHeap, fail=0x7fffffffaa38, allocSite=...)
    at /home/tjc/gecko-fork/js/src/jit/MacroAssembler.cpp:493
warning: Source file is more recent than executable.
493   allocateObject(result, temp, allocKind, 0, initialHeap, fail, allocSite);
(gdb) p shape
$10 = {reg_ = js::jit::X86Encoding::rcx, static DefaultType = js::jit::RegTypeName::GPR}
(gdb) p arrayShape
$11 = {reg_ = js::jit::X86Encoding::rcx, static DefaultType = js::jit::RegTypeName::GPR}
(gdb) quit

This debug output shows that the register allocator assigned both shape and arrayShape to the same register, %rcx. As a result, the code I generated to initialize the shapes was using the same shape for both the record object itself, and the internal sortedKeys array pointed to by the record’s sortedKeys field — causing createUninitializedRecord() to return something that looked like an array rather than a record.

The reason this was happening was the following code that I wrote:

 
AutoScratchRegisterMaybeOutput shape(allocator, masm, output);
AutoScratchRegisterMaybeOutput arrayShape(allocator, masm, output);

The AutoScratchRegisterMaybeOutput class (defined in CacheIRCompiler.h) re-uses the output register if possible, so calling it twice allocates both symbolic names to the same register. That wasn’t what I wanted.

The solution was to use the AutoScratchRegister class instead, telling the register allocator that shape and arrayShape should live in distinct registers. After making that change, repeating the original experiment gives:

(gdb) p (*(*((JSObject*) $rcx)).shape()).getObjectClass()
$5 = (const JSClass *) 0x5555590b5690 
(gdb) c
Continuing....
(gdb) p (*(*((JSObject*) $rbx)).shape()).getObjectClass()
$8 = (const JSClass *) 0x5555590a60f0 
(gdb) 

(To be sure that $rcx and $rbx were the registers allocated to shape and arrayShape, I had to set a breakpoint in my createUninitializedRecord() method again and look at the values of these variables.)

I hope this might be marginally useful to somebody trying to debug SpiderMonkey baseline code generation, or if nothing else, useful to me if I ever have to set aside this work for a few months (let’s be honest, weeks) and come back to it!