Records and objects: both the same and different

I wrote in my most recent “daily” post about the choices involved in representing records in MIR and LIR, Warp’s intermediate languages. Since then, more questions occurred to me.

MIR has a simple type system and some internal typechecking, which is helpful. LIR erases most type details but does have an even simpler type system that distinguishes JavaScript Values from raw integers/pointers/etc. I already added a Record type to MIR; an alternative would have been to give the NewRecord operation the result type Object, reflecting that records are (currently) represented as objects, but I thought that more internal type checks are always a good thing. Without a separate Record type, it’s theoretically possible that compiler bugs could result in record fields being mutable (of course, there are still ways such bugs could arise even with MIR typechecking, but it’s one more thing that makes it less likely).

I realized there’s a problem with this approach: if I want to take option 2 in the previous post — translating FinishRecord into MIR by breaking it down into smaller operations like Elements, LoadFixedSlot, etc. — it’s not possible to generate type-correct MIR. These operations expect an Object as an operand, but I want them to work on Records.

The point of representing records as objects was (at least in part) to reuse existing code for objects, but that goal exists in tension with distinguishing records from objects and keeping the MIR type system simple. Of course, we could add some form of subtyping to MIR, but that’s probably a bad idea.

So orthogonally to the three options I mentioned in my previous post, there’s another set of three options:

  1. Add Record as an MIR type, like I started out doing; don’t re-use MIR instructions like Elements for records, instead making those operations explicit in CodeGenerator, the final translation from LIR to assembly.
  2. Don’t add any new MIR types; give NewRecord the result type Object and FinishRecord the argument and result types Object
  3. Compile away records when translating JS to MIR, adding whatever new MIR instructions are necessary to make this possible; in this approach, records would desugar to Objects and any operations needed to enforce record invariants would be made explicit,

Option 1 rules out some code re-use and loses some internal checking, since more of the heavy lifting happens in the untyped world. Option 2 allows more code re-use, but also loses some internal checking, since it would be unsound to allow every Object operation to work on records. Option 3 has the advantage that only the JS-to-MIR translation (WarpBuilder / WarpCacheIRTranspiler) would need to be modified, but seems riskier than the first two options with respect to type-correctness bugs — it seems like debugging could potentially be hard if records were collapsed into objects, too.

A colleague pointed out that MIR already has some types that are similar to other types, but where certain additional invariants need to be upheld, such as RefOrNull (used in implementing WebAssembly); this type is effectively equivalent to “pointer”, but keeping it a separate type allows for the representation to change later if the WebAssembly implementors want to. Record seems very similar; we want to avoid committing to the Object representation more than necessary.

My colleague also suggested adding an explicit RecordToObject conversion, which would allow all the Object operations to be re-used for records without breaking MIR typechecking. As long as there’s no need to add an ObjectToRecord conversion (which I don’t think there is), this seems like the most appealing choice, combined with option 1 above. There’s still potential for bugs, e.g. if a compiler bug generates code that applies RecordToObject to a record and then does arbitrary object operations on it, such as mutating fields. Ideally, RecordToObject would produce something that can only be consumed by reads, not writes, but this isn’t practical to do within MIR’s existing type system.

(All of the same considerations come up for tuples as well; I just started with records, for no particular reason.)

I don’t think there’s any way to know which approach is best without having a working prototype, so I’m just going to try implementing this and seeing what happens.

Tags: ,

Leave a Reply