Igalia Compilers TeamBlog of the Igalia Compilers Team.2023-06-30T00:00:00Zhttps://blogs.igalia.com/compilersPorting BOLT to RISC-V2023-06-30T00:00:00Zhttps://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/<p>Recently, initial support for RISC-V has <a href="https://reviews.llvm.org/D145687">landed</a> in LLVM's BOLT
subproject. Even though the current functionality is limited, it was an
interesting experience of open source development to get to this point. In this
post, I will talk about what BOLT is, what it takes to teach BOLT how to process
RISC-V binaries, and the interesting detours I sometimes had to make to get this
work upstream.</p>
<h1 id="bolt-overview" tabindex="-1">BOLT overview <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h1>
<p><a href="https://github.com/llvm/llvm-project/tree/main/bolt">BOLT</a> (Binary Optimization and Layout Tool) is a post-link optimizer
whose primary goal is to improve the layout of binaries. It uses sample-based
profiling to improve the performance of already fully-optimized binaries. That
is, the goal is to be complementary to existing optimization techniques like
<a href="https://en.wikipedia.org/wiki/Profile-guided_optimization">PGO</a> and <a href="https://en.wikipedia.org/wiki/Interprocedural_optimization">LTO</a>, <em>not</em> to replace them.</p>
<p>Sample-based profiling is used in order to make it viable to obtain profiles
from production systems as its overhead is usually negligible compared to
profiling techniques based on instrumentation. Another advantage is that no
special build configuration is needed and production binaries can directly be
profiled. The choice for binary optimization (as opposed to, say, optimizing at
the IR level) comes from the accuracy of the profile data: since the profile is
gathered at the binary level, mapping it back to a higher level representation
of the code can be a challenging problem. Since code layout optimizations can
quite easily be applied at the binary level, and the accuracy of the profile is
highest there, the choice for performing post-link optimization seems to be a
logical one.</p>
<p>To use BOLT, it needs access to a binary and corresponding profile. As mentioned
before, the goal is to optimize production binaries so no special build steps
are required. The only hard requirement is that the binary contains a symbol
table (so stripped binaries are not supported). In order for BOLT to be able to
rearrange functions (in addition to the code within functions), it needs access
to relocations. Linkers usually remove relocations from the final binary but can
be instructed to keep them using the <code>--emit-relocs</code> flag. For best results, it
is recommended to link your binaries with this flag.</p>
<p>Gathering a profile on Linux systems can be done in the usual way using <code>perf</code>.
BOLT provides the necessary tools to convert <code>perf</code> output to an appropriate
format, and to combine multiple profiles. On systems where <code>perf</code> is not
available, BOLT can also instrument binaries to create profiles. For more
information on how to use BOLT, see the <a href="https://github.com/llvm/llvm-project/tree/main/bolt#usage">documentation</a>.</p>
<p>For more details on BOLT, including design decisions and evaluation, see the
<a href="https://research.fb.com/publications/bolt-a-practical-binary-optimizer-for-data-centers-and-beyond/">CGO'19 paper</a>. Let's move on to discuss some of BOLT's internals to
understand what is needed to support RISC-V.</p>
<h1 id="bolt-internals" tabindex="-1">BOLT internals <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h1>
<p>Optimizing the layout of a binary involves shuffling code around. The biggest
challenge in doing this, is making sure that all code references are still
correct. Indeed, moving a function or basic block to a different location means
changing its address and all jumps, calls, or other references to it need to be
updated because of it.</p>
<p>To do this correctly, BOLT's <em>rewriting pipeline</em> transforms binaries in the
following (slightly simplified) way:</p>
<ol>
<li><em>Function discovery</em>: using (mostly) the ELF symbol table, the boundaries of
functions are recorded;</li>
<li><em>Disassembly</em>: using LLVM's <a href="https://www.llvm.org/docs/CodeGenerator.html#the-mc-layer">MC-layer</a>, function bodies are
disassembled into lists of <code>MCInst</code> objects;</li>
<li><em>CFG construction</em>: basic blocks are discovered in the instruction lists and
references between them resolved, resulting in a control-flow graph for each
function;</li>
<li><em>Optimizations</em>: using the CFG, basic block and function layout is optimized
based on the profile;</li>
<li><em>Assembly</em>: the new layout is emitted, using LLVM's <code>MCStreamer</code> API, to an
ELF object file in memory;</li>
<li><em>Link</em>: since this object file might still contain external references, it is
linked to produce the final binary.</li>
</ol>
<p>Some of these steps are completely architecture independent. For example,
function discovery only needs the ELF symbol table. Others do need architecture
specific information. Fortunately, BOLT has supported multiple architectures
from the beginning (X86-64 and AArch64) so an abstraction layer exists that
makes it relatively straightforward to add a new target. Let's talk about what
is needed to teach BOLT to transform RISC-V binaries.</p>
<h1 id="teaching-bolt-risc-v" tabindex="-1">Teaching BOLT RISC-V <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h1>
<p>Thanks to BOLT's architecture abstraction layer, adding support for a new target
turned out to be mostly straightforward. I will go over the parts of BOLT's
rewriting pipeline that need architecture-specific information while focusing on
the aspects of RISC-V that made this slightly tricky sometimes.</p>
<h2 id="dis-assembly" tabindex="-1">(Dis)assembly <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h2>
<p>Assembly and disassembly of binaries is obviously architecture-dependent. BOLT
uses various MC-layer LLVM APIs to perform these tasks. More specifically,
<a href="https://llvm.org/doxygen/classllvm_1_1MCDisassembler.html"><code>MCDisassembler</code></a> is used for disassembly while
<a href="https://llvm.org/doxygen/classllvm_1_1MCAssembler.html"><code>MCAssembler</code></a> is used (indirectly via
<a href="https://llvm.org/doxygen/classllvm_1_1MCObjectStreamer.html"><code>MCObjectStreamer</code></a>) for assembly. The good news is that
there is excellent RISC-V support in the MC-layer so this can readily be used by
BOLT.</p>
<h2 id="cfg-construction" tabindex="-1">CFG construction <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h2>
<p>The result of disassembly is a linear list of instructions in the order they
appear in the binary. In the MC-layer, instructions are represented by
<a href="https://llvm.org/doxygen/classllvm_1_1MCInst.html"><code>MCInst</code></a> objects. In this representation, instructions essentially
consist of an opcode and a list of operands, where operands could be registers,
immediates, or more high-level expressions (<a href="https://llvm.org/doxygen/classllvm_1_1MCExpr.html"><code>MCExpr</code></a>). Expressions can
be used, for example, to refer to symbolic program locations (i.e., labels)
instead of using constant immediates.</p>
<p>Right after disassembly, however, all operands will be registers or immediates.
For example, an instruction like</p>
<pre class="language-nasm" tabindex="0"><code class="language-nasm">jal ra, f</code></pre>
<p>will be disassembled into (heavy pseudo-code here)</p>
<pre class="language-cpp" tabindex="0"><code class="language-cpp"><span class="token function">MCInst</span><span class="token punctuation">(</span>RISCV<span class="token double-colon punctuation">::</span>JAL<span class="token punctuation">,</span> <span class="token punctuation">[</span>RISCV<span class="token double-colon punctuation">::</span>X1<span class="token punctuation">,</span> ImmOffset<span class="token punctuation">]</span><span class="token punctuation">)</span></code></pre>
<p>where <code>ImmOffset</code> is the offset from the <code>jal</code> instruction to <code>f</code>. This is not
convenient to handle in BOLT as nothing indicates that this <code>MCInst</code> actually
refers to <code>f</code>.</p>
<p>Therefore, BOLT post-processes instructions after disassembly and replaces
immediates with symbolic references where appropriate. Two different mechanisms
are used to figure out the address an instruction refers to:</p>
<ul>
<li>For control-transfer instructions (e.g., calls and branches),
<a href="https://llvm.org/doxygen/classllvm_1_1MCInstrAnalysis.html"><code>MCInstrAnalysis</code></a> is used to evaluate the target. LLVM's
RISC-V backend already contained an appropriate implementation for this.</li>
<li>For other instructions (e.g., <code>auipc</code>/<code>addi</code> pairs to load an address in
RISC-V), relocations are used. For this, BOLT's <code>Relocation</code> class had to be
extended to support RISC-V ELF relocations.</li>
</ul>
<p>Once the target of an instruction had been determined, BOLT creates an
<code>MCSymbol</code> at that location and updates the <code>MCInst</code> to point to that symbol
instead of an immediate offset.</p>
<p>One question remains: how does BOLT detect control-transfer instructions? Let's
first discuss how BOLT creates the control-flow graph now that all instructions
symbolically refer to their targets.</p>
<p>A <a href="https://en.wikipedia.org/wiki/Control-flow_graph">CFG</a> is a directed graph where the nodes are <a href="https://en.wikipedia.org/wiki/Basic_block">basic blocks</a> and the
edges are control-flow transfers between those basic blocks. Without going into
details, BOLT has a target-independent algorithm to create a CFG from a list of
instructions (for those interested, you can find it <a href="https://github.com/llvm/llvm-project/blob/243f0566dc414e8bb6e15c7a6ae490d0e3cd0656/bolt/lib/Core/BinaryFunction.cpp#L1917-L2182">here</a>). It
needs some target-specific information about instructions though. For example:</p>
<ul>
<li><em>Terminators</em> are instructions that end basic block (e.g., branches and
returns but <em>not</em> calls).</li>
<li><em>Branches and jumps</em> are the instructions that create edges in the CFG.</li>
</ul>
<p>To get this information, BOLT relies again on <code>MCInstrAnalysis</code> which provides
methods such as <code>isTerminator</code> and <code>isCall</code>. These methods can be specialized by
specific LLVM backends but the default implementation relies on the
<a href="https://llvm.org/doxygen/classllvm_1_1MCInstrDesc.html"><code>MCInstrDesc</code></a> class. Objects of this class are generated by
various TableGen files in the backends (e.g., <a href="https://github.com/llvm/llvm-project/blob/b3b54131d0a025c74082b7cb843d83fbd8814865/llvm/lib/Target/RISCV/RISCVInstrInfo.td">this one</a> for
RISC-V). An important property of <code>MCInstrDesc</code> for the next discussion is that
its information is based <em>only</em> on opcodes, operands are <em>not</em> taken into
account.</p>
<p>LLVM's RISC-V backend did not specialize <code>MCInstrAnalysis</code> so BOLT was relying
<code>MCInstrDesc</code> to get information about terminators and branches. For many
targets (e.g., X86) this might actually be fine but for RISC-V, this causes
problems. For example, take a <code>jal</code> instruction: is this a terminator, a branch,
a call? Based solely on the opcode, we cannot actually answer these questions
because <code>jal</code> is used both for direct jumps (terminator) and function calls
(non-terminator).</p>
<p>The solution to this problem was to specialize <code>MCInstrAnalysis</code> for RISC-V
taking the <a href="https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc">calling convention</a> into account:</p>
<ul>
<li><code>jal zero, ...</code> is an unconditional branch (return address discarded);</li>
<li><code>jal ra, ...</code> is a call (return address stored in <code>ra</code> (<code>x1</code>) which the
calling convention designates as the return address register);</li>
<li>Some more rules for <code>jalr</code>, compressed instructions, detecting returns,...</li>
</ul>
<p>So the <a href="https://reviews.llvm.org/D146438">first patch</a> that landed to pave the way for
RISC-V support in BOLT was not in the BOLT project but in the RISC-V MC-layer.</p>
<hr>
<p>With this in place, the <a href="https://reviews.llvm.org/D145687">patch</a> to add a RISC-V target to BOLT
consisted mainly of implementing the necessary relocations and implementing the
architecture abstraction layer. The latter consisted mainly of instruction
manipulation (e.g., updating branch targets), detecting some types of
instructions not supported by <code>MCInstrAnalysis</code> (e.g., nops), and analyzing
RISC-V-specific Procedure Linkage Table (PLT) entries (so BOLT knows which
function they refer to). Once I started to understand the internals of BOLT,
this was relatively straightforward. After iterating over the patch with the
BOLT maintainers (who were very helpful and responsive during this process), it
got accepted in less than a month.</p>
<p>There was just one minor issue to resolve.</p>
<h2 id="linking" tabindex="-1">Linking <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h2>
<p>The final step in the rewriting pipeline is linking the generated object file.
BOLT is able to rely on LLVM again by using the RuntimeDyld JIT linker which is
part of the <a href="https://www.llvm.org/docs/MCJITDesignAndImplementation.html">MCJIT</a> project. Unfortunately, there was no RISC-V support
yet in RuntimeDyld. Looking at the supported targets, it seemed easy enough to
implement RISC-V support: I just needed to implement the few relocations that
BOLT emits. So I submitted a <a href="https://reviews.llvm.org/D145686">patch</a>.</p>
<p>Alas, it <a href="https://reviews.llvm.org/D145686#4222642">seemed</a> that things might not be as easy as I hoped:</p>
<blockquote>
<p>Is there something preventing Bolt from moving to ORC / JITLink? If Bolt is
able to move over then the aim should be to do that. If Bolt is unable to move
over then we need to know why so that we can address the issue. RuntimeDyld is
very much in maintenance mode at the moment, and we're working hard to reach
parity in backend coverage so that we can officially deprecate it.</p>
</blockquote>
<p>Even though this comment was followed up by this:</p>
<blockquote>
<p>None of that is a total blocker to landing this, but the bar is high, and it
should be understood that Bolt will <em>need</em> to migrate in the future.</p>
</blockquote>
<p>trying to push-through the patch didn't feel like the right approach. For one,
I'm anticipating to need some more advanced linker features for RISC-V in the
future (e.g., linker relaxation) and I wouldn't want to implement those in a
deprecated linker. Moreover, the recommended linker, <a href="https://llvm.org/docs/JITLink.html">JITLink</a>, has
mostly complete RISC-V support and, importantly, more users and reviewers,
making its implementation most certainly of higher quality than what I would
implement by myself in RuntimeDyld.</p>
<p>So the way forward for bringing RISC-V support to BOLT seemed to be to first
port BOLT from using RuntimeDyld to JITLink. Since it looked like this
wasn't going to be a priority for the BOLT maintainers, I decided I might as
well give it a shot myself. Even though this would surely mean a significant
delay in finishing my ultimate goal of RISC-V support in BOLT, it felt like a
great opportunity to me: it allowed me to learn more about linkers and BOLT's
internals, as well as to invest in a project that am hoping to use in the
foreseeable future.</p>
<p>Porting BOLT to JITLink was hard, at least for me. It had a far ranging impact
on many parts of BOLT that I had never touched before. This meant it took quite
some time to try and understand these parts, but also that I learned a lot in
the process. Besides changes to BOLT, I <a href="https://reviews.llvm.org/D149138">submitted</a>
<a href="https://reviews.llvm.org/D150778">a</a> <a href="https://reviews.llvm.org/D151305">few</a> JITLink patches to
implement some missing AArch64 relocations that BOLT needed. In the end, I
managed to pass all BOLT tests and submit a <a href="https://reviews.llvm.org/D147544">patch</a>.</p>
<p>This patch took about a month and a half to get accepted. The BOLT maintainers
were <em>very</em> helpful and responsive in the process. They were also very strict,
though. Rightfully so, of course, as BOLT is being used in production systems.
The main requirement for the patch to get accepted was that BOLT's output would
be a 100% binary match with the RuntimeDyld version. This was necessary to
ease the verification of the correctness of the patch. With the help of the BOLT
maintainers, we managed to get the patch in an acceptable state to land it.</p>
<h1 id="looking-forward" tabindex="-1">Looking forward <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h1>
<p>With BOLT being ported to JITLink, the <a href="https://reviews.llvm.org/D145687">patch</a> to add initial
RISC-V support to BOLT could finally land. This doesn't mean that BOLT is
currently very usable for RISC-V binaries, though: most binaries can pass
through BOLT fine but many of BOLT's transformations are not supported yet.</p>
<p>Since the initial support was added, I landed a few more patches to improve
usability. For example, support for an obscure ELF feature called <a href="https://www.sco.com/developers/gabi/latest/ch4.reloc.html">composed
relocations</a> was <a href="https://reviews.llvm.org/D146546">added</a>, something
RISC-V uses for <code>R_RISCV_ADD32/SUB32</code> relocations (which BOLT
<a href="https://reviews.llvm.org/D146554">supports</a> now). Other patches deal with
<a href="https://reviews.llvm.org/D153342">creation</a> and <a href="https://reviews.llvm.org/D153344">reversal</a>
of branches, something BOLT needs to fix-up basic blocks after their layout has
changed.</p>
<p>I'm currently working on handling binaries that have been relaxed during
linking. The issue is that, after BOLT has moved code around, relaxed
instructions might not fit the new addresses anymore. I plan to handle this as
follows: during disassembly, BOLT will "unrelax" instructions (e.g., translating
a <code>jal</code> back to an <code>auipc</code>/<code>jalr</code> pair) to make sure new addresses will always
fit. The linker will then undo this, when possible, by performing relaxation
again. The first step for this, adding linker relaxation support to JITLink,
has been <a href="https://reviews.llvm.org/D149526">landed</a>. More on this in a future post.</p>
<h1 id="wrapping-up" tabindex="-1">Wrapping up <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/30/porting-bolt-to-risc-v/">#</a></h1>
<p>Bringing initial RISC-V support to BOLT has been a very interesting and
educational journey for me, both from a technical as well as a social
perspective. Having to work on multiple projects (LLVM MC, JITLink, BOLT) has
taught me new technologies and put me in contact with great communities. I
certainly hope to be able to continue this work in the future.</p>
<p>I'll close this post with a reference of the graph at the top, showing what it
took, over a series of ~25 patches, to get RISC-V support in BOLT. I think this
demonstrates the kind of detours that are sometimes needed to get work upstream,
in this case benefiting both the RISC-V community (RISC-V support in BOLT) and
BOLT as a whole (moving away from a deprecated linker and fixing bugs
encountered along the way)</p>
QuickJS: An Overview and Guide to Adding a New Feature2023-06-12T00:00:00Zhttps://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/<p>In a previous blog post, I briefly mentioned QuickJS (QJS) as an alternative
implementation of JavaScript (JS) that does not run in a web browser. This
time, I'd like to delve deeper into QJS and explain how it works.</p>
<p>First, some remarks on QJS's history and overall architecture. QJS
was written by <a href="https://www.bellard.org/">Fabrice Bellard</a>, who you may know as the
original author of Qemu and FFmpeg, and was first released in
2019. QJS is primarily a bytecode interpreter (with no <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">JIT compiler</a> tiers) that
can execute JS relatively <a href="https://bellard.org/quickjs/bench.html">quickly</a>.</p>
<p>You can invoke QJS from the command-line like NodeJS and similar systems:</p>
<pre class="language-shell-session" tabindex="0"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token builtin class-name">echo</span> <span class="token string">"console.log('hello world');"</span> <span class="token operator">></span> hello.js</span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">qjs hello.js <span class="token comment"># qjs is the main executable for quickjs</span></span></span><br><span class="token output">hello world</span></code></pre>
<p>QJS comes with another tool called <code>qjsc</code> that can produce small executable
binaries from JS source code. It does so by embedding QJS bytecode in C code
that links with the QJS runtime, which avoids the need to parse JS to bytecode
at runtime.</p>
<p>The following example demonstrates this (note: feel free to skip over the
the details of this C code output, it's not crucial for the rest of the post):</p>
<pre class="language-shell-session" tabindex="0"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">qjsc hello.js <span class="token parameter variable">-e</span> <span class="token parameter variable">-o</span> hello.c <span class="token comment"># qjsc compiles the JS instead of running directly</span></span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token function">cat</span> hello.c</span></span><br><span class="token output">/* File generated automatically by the QuickJS compiler. */<br><br>#include "quickjs-libc.h"<br><br>const uint32_t qjsc_hello_size = 78;<br><br>const uint8_t qjsc_hello[78] = {<br> 0x02, 0x04, 0x0e, 0x63, 0x6f, 0x6e, 0x73, 0x6f,<br> 0x6c, 0x65, 0x06, 0x6c, 0x6f, 0x67, 0x16, 0x68,<br> 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, 0x72,<br> 0x6c, 0x64, 0x10, 0x68, 0x65, 0x6c, 0x6c, 0x6f,<br> 0x2e, 0x6a, 0x73, 0x0e, 0x00, 0x06, 0x00, 0xa0,<br> 0x01, 0x00, 0x01, 0x00, 0x03, 0x00, 0x00, 0x14,<br> 0x01, 0xa2, 0x01, 0x00, 0x00, 0x00, 0x38, 0xe1,<br> 0x00, 0x00, 0x00, 0x42, 0xe2, 0x00, 0x00, 0x00,<br> 0x04, 0xe3, 0x00, 0x00, 0x00, 0x24, 0x01, 0x00,<br> 0xcd, 0x28, 0xc8, 0x03, 0x01, 0x00,<br>};<br><br>static JSContext *JS_NewCustomContext(JSRuntime *rt)<br>{<br> JSContext *ctx = JS_NewContextRaw(rt);<br> if (!ctx)<br> return NULL;<br> JS_AddIntrinsicBaseObjects(ctx);<br> JS_AddIntrinsicDate(ctx);<br> JS_AddIntrinsicEval(ctx);<br> JS_AddIntrinsicStringNormalize(ctx);<br> JS_AddIntrinsicRegExp(ctx);<br> JS_AddIntrinsicJSON(ctx);<br> JS_AddIntrinsicProxy(ctx);<br> JS_AddIntrinsicMapSet(ctx);<br> JS_AddIntrinsicTypedArrays(ctx);<br> JS_AddIntrinsicPromise(ctx);<br> JS_AddIntrinsicBigInt(ctx);<br> return ctx;<br>}<br><br>int main(int argc, char **argv)<br>{<br> JSRuntime *rt;<br> JSContext *ctx;<br> rt = JS_NewRuntime();<br> js_std_set_worker_new_context_func(JS_NewCustomContext);<br> js_std_init_handlers(rt);<br> JS_SetModuleLoaderFunc(rt, NULL, js_module_loader, NULL);<br> ctx = JS_NewCustomContext(rt);<br> js_std_add_helpers(ctx, argc, argv);<br> js_std_eval_binary(ctx, qjsc_hello, qjsc_hello_size, 0);<br> js_std_loop(ctx);<br> JS_FreeContext(ctx);<br> JS_FreeRuntime(rt);<br> return 0;<br>}</span></code></pre>
<p>It's possible to embed parts of this C output into a larger program, for adding the
ability to script a system in JS for example. You can also compile it, along with the QJS runtime, to WebAssembly (as
is done in tools such as the Bytecode Alliance's <a href="https://github.com/bytecodealliance/javy">Javy</a>).</p>
<p>QJS as it exists today supports many features in the JS standard, but not all of them.
What if you need to extend it to support modern JS features? Where would you start?</p>
<p>To address these questions, the rest of this post explains some of the internals
of QJS by walking through the implementation of a new feature. The feature
that we will explore is the <a href="https://github.com/tc39/proposal-private-fields-in-in">ergonomic brand checks for private fields proposal</a>,
which I picked because it is a relatively simple and straightforward feature to implement.
This proposal reached stage 4 in the <a href="https://tc39.es/process-document/">TC39 process</a> in 2021, and is currently
part of the official <a href="https://262.ecma-international.org/13.0/">ECMAScript 2022 standard</a>.</p>
<p>Before getting into the details of adding the new feature, we'll first start with an explanation
of what the proposal we are exploring actually does. After that, I'll explain how QJS processes
JS code at a high-level before diving into the details of how to implement this proposal.</p>
<h1 id="explaining-ergonomic-brand-checks-for-private-fields" tabindex="-1">Explaining "ergonomic brand checks for private fields" <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h1>
<p>The proposal we'll be exploring is titled <a href="https://github.com/tc39/proposal-private-fields-in-in">"Ergonomic brand checks for private fields"</a>,
which for the rest of this post I'll shorten to "private brand checks". Since ES2022, JS has supported
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Private_class_fields">private fields</a>
in classes. For example, you can declare a private field as follows:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">class</span> <span class="token class-name">Foo</span> <span class="token punctuation">{</span><br> #priv <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">// private field declaration (needed for #priv to be in scope)</span><br> <span class="token function">get</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">return</span> <span class="token keyword">this</span><span class="token punctuation">.</span>#priv<span class="token punctuation">;</span> <span class="token punctuation">}</span><br><span class="token punctuation">}</span><br><br><span class="token keyword">new</span> <span class="token class-name">Foo</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// returns 0</span><br><span class="token keyword">new</span> <span class="token class-name">Foo</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span>#priv<span class="token punctuation">;</span> <span class="token comment">// error, it's private</span></code></pre>
<p>Note that the <code>#</code> syntax is special and only allowed for private field names.
Ordinary identifiers cannot be used to define a private field.</p>
<p>Private brand checks, also added in ES2022, are just a way to check if a given object has a given
private field with a convenient syntax. For example, the <code>isFoo</code> static method in the following
snippet uses a private brand check:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">class</span> <span class="token class-name">Foo</span> <span class="token punctuation">{</span><br> #priv<span class="token punctuation">;</span> <span class="token comment">// necessary declaration</span><br> <span class="token keyword">static</span> <span class="token function">isFoo</span><span class="token punctuation">(</span><span class="token parameter">obj</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">return</span> #priv <span class="token keyword">in</span> obj<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">// brand check for #priv</span><br><span class="token punctuation">}</span><br><br><span class="token keyword">class</span> <span class="token class-name">Bar</span> <span class="token punctuation">{</span><br> #priv<span class="token punctuation">;</span> <span class="token comment">// a different #priv than above!</span><br><span class="token punctuation">}</span><br><br>Foo<span class="token punctuation">.</span><span class="token function">isFoo</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Foo</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// returns true</span><br>Foo<span class="token punctuation">.</span><span class="token function">isFoo</span><span class="token punctuation">(</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// returns false</span><br>Foo<span class="token punctuation">.</span><span class="token function">isFoo</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">Bar</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// returns false</span></code></pre>
<p>The example shows that the proposal overloads the behavior of
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/in"><code>in</code></a>
so that if the left-hand side is a private field name, it checks for the
presence of that private field. Note that since private names are scoped
to the class, private names that look superficially identical in different
classes may not pass the same brand checks (as the example above showed).</p>
<p>Now that we know what this proposal does, let's talk about what it takes to implement it.
Before explaining the nitty-gritty details, we'll first talk about the architecture of
QJS at a high-level.</p>
<h1 id="architecture-overview" tabindex="-1">Architecture overview <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h1>
<p>Most people probably run JS code in a
web browser or via a runtime like NodeJS, Deno, or Bun that uses those browsers' JS
engines. These engines typically use a tiered implementation strategy in which
code often starts running in an interpreter and then tiers up to a compiler,
perhaps multiple compilers, to produce faster code (see <a href="https://hacks.mozilla.org/2017/02/a-crash-course-in-just-in-time-jit-compilers/">this blog post</a>
by Lin Clark for a high-level overview).</p>
<p>These engines typically also compile the JS source program into bytecode,
an intermediate form that can be interpreted and compiled more easily than the source code or
its parsed abstract syntax tree (AST).</p>
<p>QJS shares some of these steps, in that it also compiles JS to bytecode and
then interprets the bytecode. However, it has no additional execution tiers.</p>
<p>While web browers generally have to fetch JS source code and compile to
bytecode while running (though there is <a href="https://v8.dev/blog/code-caching-for-devs">bytecode caching</a>
to optimize this), when QJS emits an executable (e.g., the use of <code>qjsc</code> from earlier)
it avoids the runtime parsing step by compiling the bytecode into the executable.</p>
<p>The QJS bytecode is designed for a stack machine (unlike, say, V8's
<a href="https://v8.dev/blog/ignition-interpreter">Ignition</a> interpreter which uses a
register machine). That is, the operations in the bytecode fetch data from
the runtime system's stack. WebAssembly (Wasm) made a similar choice,
which reflects a goal shared by both Wasm and QJS to produce small binaries. A
stack machine can save overhead in instruction encoding because the
instructions do not specify register names to fetch operands from. Instead,
instructions just fetch their operands from the stack.</p>
<p>Thus, the overall operation of QJS is that it parses a JS file and creates a
representation of the module or script, which contains some functions. Each
function is compiled to bytecode. Then QJS interprets that bytecode to
execute the program.</p>
<p><picture><source type="image/avif" srcset="https://blogs.igalia.com/compilers/img/2AeF9BADg9-930.avif 930w"><source type="image/webp" srcset="https://blogs.igalia.com/compilers/img/2AeF9BADg9-930.webp 930w"><img alt="Diagram illustrating the steps in the execution pipeline for QuickJS" loading="lazy" decoding="async" src="https://blogs.igalia.com/compilers/img/2AeF9BADg9-930.png" width="930" height="559"></picture></p>
<p>Adding support for a new proposal will affect several parts of this pipeline.
In the case of private brand checks, we will need to modify the parser to
accept the new syntax, add a new bytecode to represent the new operation, and
add a new case in the core interpreter loop to implement that operation.</p>
<p>With that high-level overview in mind, we'll dive into specific parts of QJS in the
following sections. Since QJS is written in C
(in fact, the bulk of the system is contained in a <a href="https://raw.githubusercontent.com/bellard/quickjs/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c">single 10k+ line C file</a>.),
I'll be showing example snippets of C code to show what needs to change to implement private brand checks.</p>
<h1 id="parser" tabindex="-1">Parser <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h1>
<p>The typical parsing pass in JS engines
translates the JS source code to an internal AST representation. There is a
separate bytecode generation pass that walks the AST and linearizes its
structure into bytecodes.</p>
<p>QJS fuses these two passes and directly generates bytecode while parsing the
source code. While this saves execution time, it does add its own kind of
complexity.</p>
<p>To understand parsing, it's useful to know where QJS kicks off the process.
<a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L33576"><code>JS_EvalInternal</code></a> is the entry
point for evaluating JS code. This can either evaluate and construct the runtime
representation of a script or module in order to execute it, or just compile it to bytecode
to emit to a file.</p>
<p>In turn, this will first run the lexer to create a tokenized version of the
source code. Afterwards, it calls <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L33458"><code>js_parse_program</code></a> to
parse the tokenized source code. The parser has its own state
(<code>JSParseState</code>) which contains information on where the parser is in the token
stream, the bytecodes emitted so far, and so on.</p>
<p>The parser broadly follows the structure of the JS specification's <a href="https://262.ecma-international.org/12.0/#sec-ecmascript-language-statements-and-declarations">grammar</a>,
in which statements and expressions are organized in a particular nesting structure to avoid ambiguity. For
modifying how the <code>in</code> operator gets parsed, we'll be interested in how <a href="https://262.ecma-international.org/12.0/#sec-relational-operators">relational expressions</a>
in particular are parsed. As relational expressions are a kind of binary operator expression, they're handled in QJS by
the <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L25175"><code>js_parse_expr_binary</code></a> function.
That function handles binary operators by "level", corresponding to how they nest in the formal grammar. The bottom level consists
of multiplicative expressions, up to bitwise logical operators. The <code>in</code> operator is handled at <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L25253">level 5</a>, along with other relational operators like <code><</code>.</p>
<p>Since QJS will output the stack bytecode instructions in a single pass,
it's necessary in a binary expression like <code>expr_1 in expr_2</code> to first parse
<code>expr_1</code> and emit its bytecode, then parse <code>expr_2</code> and emit that, then finally
emit the bytecode for <code>OP_in</code> (i.e., it's a post-order traversal of the AST, since
stack instructions are essentially postfix).</p>
<p>We won't need to change <code>js_parse_expr_binary</code> for private brand checks,
as the main difference from normal <code>in</code> operators is how the left-hand side is parsed.
For that, we'll be interested in <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L24330"><code>js_parse_postfix_expr</code></a>,
which parses references to variable names (and is eventually called by
<code>js_parse_expr_binary</code>). The <code>js_parse_postfix_expr</code> function,
like most other parsing functions, has a switch statement that dispatches on different
token types.</p>
<p>For example, there are tokens such as <code>TOK_IDENT</code> for ordinary identifiers for
variables (e.g., <code>foo</code>) and <code>TOK_PRIVATE_NAME</code> for private field names (e.g., <code>#foo</code>).
We will need to add a new case for private field tokens in the switch for <code>js_parse_postfix_expr</code>:</p>
<pre class="language-c" tabindex="0"><code class="language-c"> <span class="token keyword">case</span> TOK_PRIVATE_NAME<span class="token operator">:</span><br> <span class="token punctuation">{</span><br> JSAtom name<span class="token punctuation">;</span><br> <span class="token comment">// Only allow this syntax if the next token is `in`.</span><br> <span class="token comment">// The left-hand side of a private brand check can't be a nested expression, it</span><br> <span class="token comment">// has to specifically be a private name.</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">peek_token</span><span class="token punctuation">(</span>s<span class="token punctuation">,</span> FALSE<span class="token punctuation">)</span> <span class="token operator">!=</span> TOK_IN<span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token comment">// I'll explain a bit about atoms later. This code extracts</span><br> <span class="token comment">// a handle for the string content of the private name.</span><br> name <span class="token operator">=</span> <span class="token function">JS_DupAtom</span><span class="token punctuation">(</span>s<span class="token operator">-></span>ctx<span class="token punctuation">,</span> s<span class="token operator">-></span>token<span class="token punctuation">.</span>u<span class="token punctuation">.</span>ident<span class="token punctuation">.</span>atom<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">next_token</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token comment">// This is a new bytecode that we'll add that looks up that the private</span><br> <span class="token comment">// field is valid and produces data for the `in` operator.</span><br> <span class="token function">emit_op</span><span class="token punctuation">(</span>s<span class="token punctuation">,</span> OP_scope_ref_private_field<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token comment">// These are the arguments for the above op code in the instruction stream.</span><br> <span class="token function">emit_u32</span><span class="token punctuation">(</span>s<span class="token punctuation">,</span> name<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token function">emit_u16</span><span class="token punctuation">(</span>s<span class="token punctuation">,</span> s<span class="token operator">-></span>cur_func<span class="token operator">-></span>scope_level<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">break</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span></code></pre>
<p>This case allows a private name to appear, and only allows
it if the next token in the stream is <code>in</code>. We need the restriction because we don't want the
private name to appear in any other expression, as those are invalid (private names should
otherwise only appear in declarations in classes or in expressions like <code>this.#priv</code>).</p>
<p>It also emits the bytecode for this expression, which uses a
new <code>scope_ref_private_field</code> operator that we add. When new opcodes get added, they're
defined in <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs-opcode.h"><code>quickjs-opcode.h</code></a>.
The <code>scope_ref_private_field</code> opcode is a new variant on existing opcodes
like <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs-opcode.h#L280"><code>scope_get_private_field</code></a>
that are already defined in that header.</p>
<p>The <code>scope_ref_private_field</code> operator actually never appears in
executable bytecode, and only appears temporarily as input to another pass. When I said bytecode
is emitted from the parser in a single pass earlier, this was actually a slight simplification.
After the initial parse, the bytecode goes through a scope resolution phase (see
<a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L30782"><code>resolve_variables</code></a>) where certain
kinds of scope violations are ruled out. For example, the phase would signal an error on the following code:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token comment">// Invalid example</span><br><span class="token keyword">class</span> <span class="token class-name">Foo</span> <span class="token punctuation">{</span><br> <span class="token comment">// missing declaration of #priv</span><br> <span class="token function">foo</span><span class="token punctuation">(</span><span class="token parameter">obj</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">return</span> #priv <span class="token keyword">in</span> obj<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">// #priv is unbound</span><br><span class="token punctuation">}</span></code></pre>
<p>There's also an optimization pass on the bytecode to obtain some speedups in interpretation later.</p>
<p>In the scope resolution phase, <code>scope_ref_private_field</code> is translated to a <code>get_var_ref</code> operation, which
looks up a variable in the runtime environment. This will resolve a variable to an index
that the runtime can use to look up the private field in an object's property table. The reason
we add this new operation is that existing operations like <code>scope_get_private_field</code> also
get translated to do the actual field lookup in the object immediately, whereas we want to wait until the
<code>in</code> operator is executed in order to do that.</p>
<h1 id="interpreter-and-runtime" tabindex="-1">Interpreter and runtime <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h1>
<p>Once the bytecode compilation process is finished, the interpreter can start
executing the program. QJS treats everything uniformly by considering all
execution to take place in a function, so for example the code that runs
in a module or script top-level is also in a special kind of function.</p>
<p>Therefore, all execution in QJS takes place in a core interpreter loop which
runs a function body. It loads the bytecode for that function body and
repeatedly runs the operations specified by the bytecode until it reaches the
end. When executing the bytecode, the interpreter also maintains a
runtime stack that stores temporary values produced by the operators. The
interpreter allocates exactly enough stack space to run a particular function; the
compiler pre-computes the max stack size for each function and
encodes it in the bytecode format.</p>
<p>To add a new instruction, usually you add a new case to the big switch statement
in the main interpreter loop in <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L16198"><code>JS_CallInternal</code></a>.
Since we're just extending an existing operator, this case <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L18413">already exists</a>.
So instead, we need to extend the helper function <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/quickjs.c#L14747"><code>js_operator_in</code></a>.
An annotated version of that function looks like this:</p>
<pre class="language-c" tabindex="0"><code class="language-c"><span class="token comment">// Note: __exception is a QJS convention to warn if the result is unused</span><br><span class="token keyword">static</span> __exception <span class="token keyword">int</span> <span class="token function">js_operator_in</span><span class="token punctuation">(</span>JSContext <span class="token operator">*</span>ctx<span class="token punctuation">,</span> JSValue <span class="token operator">*</span>sp<span class="token punctuation">)</span><br><span class="token punctuation">{</span><br> JSValue op1<span class="token punctuation">,</span> op2<span class="token punctuation">;</span><br> JSAtom atom<span class="token punctuation">;</span><br> <span class="token keyword">int</span> ret<span class="token punctuation">;</span><br><br> <span class="token comment">// Reference the values in the top two stack slots</span><br> <span class="token comment">// op1 is the result of executing the left-hand side of the `in`</span><br> <span class="token comment">// op2 is the result of executing the right-hand side of the `in`</span><br> op1 <span class="token operator">=</span> sp<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br> op2 <span class="token operator">=</span> sp<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br><br> <span class="token comment">// op2 is the right-hand-side of `in`, which must be a JS object</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">JS_VALUE_GET_TAG</span><span class="token punctuation">(</span>op2<span class="token punctuation">)</span> <span class="token operator">!=</span> JS_TAG_OBJECT<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token function">JS_ThrowTypeError</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> <span class="token string">"invalid 'in' operand"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br><br> <span class="token comment">// Atoms are covered in more detail below</span><br> <span class="token comment">// but generally this just converts a string or symbol to a</span><br> <span class="token comment">// handle to an interned string, or it's a tagged number</span><br> atom <span class="token operator">=</span> <span class="token function">JS_ValueToAtom</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op1<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">unlikely</span><span class="token punctuation">(</span>atom <span class="token operator">==</span> JS_ATOM_NULL<span class="token punctuation">)</span><span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br><br> <span class="token comment">// Look up if the property corresponding to left-hand-side name exists in the object.</span><br> ret <span class="token operator">=</span> <span class="token function">JS_HasProperty</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op2<span class="token punctuation">,</span> atom<span class="token punctuation">)</span><span class="token punctuation">;</span><br><br> <span class="token comment">// QJS also has a reference-counting garbage collector. We need to appropriately</span><br> <span class="token comment">// free (i.e, decrement refcounts) on values when we stop using them.</span><br> <span class="token function">JS_FreeAtom</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> atom<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span>ret <span class="token operator"><</span> <span class="token number">0</span><span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token function">JS_FreeValue</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op1<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token function">JS_FreeValue</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op2<span class="token punctuation">)</span><span class="token punctuation">;</span><br><br> <span class="token comment">// Push a boolean onto the top stack slot</span><br> <span class="token comment">// Note: the stack is shrunk after this by the main loop, so -2 is the top.</span><br> sp<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">JS_NewBool</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> ret<span class="token punctuation">)</span><span class="token punctuation">;</span><br><br> <span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>At this point in the code, the results of evaluating the left- and right-hand
side expressions of an <code>in</code> are already on the stack. These are JS values, so
now might be a good time to talk about how values are represented in QJS.</p>
<h2 id="object-representation" tabindex="-1">Object Representation <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h2>
<p>All JS engines have their own internal representation of JS values, which
include primitive values such as symbols and numbers and also object values.
Since JS is dynamically typed, a given function can be called with all kinds of
values, so the engine's representation needs a way to distinguish the values
to appropriately signal an error, or choose the correct operation.</p>
<p>To do this, values need to come with some kind of tag.
Some engines use a tagging scheme such as NaN-boxing to store all values
inside the bit pattern of a 64-bit floating point number (using the different kinds of NaNs
that exist in the IEEE-754 standard to distinguish cases).
My colleague Andy Wingo wrote a <a href="https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations">blog post</a> on
this topic a while ago, laying out various options that JS engines use.</p>
<p>QJS uses a much simpler scheme, and dedicates 128 bits to each JS value. Half of
that is the payload (a 64-bit float, pointer, etc.) and half is the tag value. The
following definitions show how this is represented in C:</p>
<pre class="language-c" tabindex="0"><code class="language-c"><span class="token keyword">typedef</span> <span class="token keyword">union</span> JSValueUnion <span class="token punctuation">{</span><br> <span class="token class-name">int32_t</span> int32<span class="token punctuation">;</span><br> <span class="token keyword">double</span> float64<span class="token punctuation">;</span><br> <span class="token keyword">void</span> <span class="token operator">*</span>ptr<span class="token punctuation">;</span><br><span class="token punctuation">}</span> JSValueUnion<span class="token punctuation">;</span><br><br><span class="token keyword">typedef</span> <span class="token keyword">struct</span> <span class="token class-name">JSValue</span> <span class="token punctuation">{</span><br> JSValueUnion u<span class="token punctuation">;</span><br> <span class="token class-name">int64_t</span> tag<span class="token punctuation">;</span><br><span class="token punctuation">}</span> JSValue<span class="token punctuation">;</span></code></pre>
<p>On 32-bit platforms there is a different tagging scheme that I won't detail other
than to note that it uses NaN-boxing with a 64-bit representation.</p>
<p>For the most part, the representation details are abstracted by various macros like <code>JS_VALUE_GET_TAG</code>
used in the example code above, so there won't be much need to directly interact
with the value representation in this post.</p>
<h3 id="reference-counting-and-objects" tabindex="-1">Reference counting and objects <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h3>
<p>Compound data, such as objects and strings, are tracked by a relatively
simple reference counting garbage collector in QJS. This is in contrast to the much more complex
collectors in web engines, such as WebKit's <a href="https://webkit.org/blog/7122/introducing-riptide-webkits-retreating-wavefront-concurrent-garbage-collector/">Riptide</a>,
that have different design tradeoffs and requirements such as the need for concurrency.
There's a lot more to say about how reference counting and compound data work in QJS,
but I'll save most of those details for a future post.</p>
<h2 id="atoms-and-strings" tabindex="-1">Atoms and strings <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h2>
<p>Certain data types have a special representation because they are so common and
are used repeatedly in the program. These are small integers and strings. These
correspond to property names, symbols, private names, and so on.
QJS uses a datatype called an Atom for these cases (which has
already appeared in code examples above).</p>
<p>An atom is a handle that is either tagged as an integer, or is an index that
refers to an <a href="https://en.wikipedia.org/wiki/String_interning">interned string</a>, i.e.,
a unique string that is only allocated once and stored in a hash table.
Atoms that appear in the program's bytecode are also serialized in the bytecode
format itself, and are loaded into the runtime table on initialization.</p>
<p>The data type <code>JSAtom</code> is defined as a <code>uint32_t</code>, so it's just a 32-bit integer.
Properties of objects, for example, are always accessed with atoms as the
property key. This means that property tables in objects just need to map
atoms to the stored values.</p>
<p>You can see this in action with the <code>JS_HasProperty</code> lookup above, which
looks like <code>JS_HasProperty(ctx, op2, atom)</code>. This code looks up a key <code>atom</code> in the
object <code>op2</code>'s property table. In turn, <code>atom</code> comes
from the line <code>atom = JS_ValueToAtom(ctx, op1)</code>, which converts the
property name value <code>op1</code> into either an integer or a handle to an interned string.</p>
<h2 id="changing-the-operation-to-support-private-fields" tabindex="-1">Changing the operation to support private fields <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h2>
<p>The actual change to <code>js_operator_in</code> to support private brand checks is
very simple. In the case that the private field is a non-method field, the
resolved private name lookup via <code>get_var_ref</code> pushes a symbol value
onto the stack. This case doesn't require any changes.</p>
<p>In the case that the private field refers to a method, the name lookup pushes a function
object onto the stack. We then need to run a private brand check with the
target object and this private function, to ensure the private function really
is part of the object.</p>
<p>At a high level, you can see the similarity between this operation and
the runtime semantics described in the <a href="https://tc39.es/proposal-private-fields-in-in/">formal spec</a> for the private brand
check proposal.</p>
<p>The modified code looks like the following:</p>
<pre class="language-c" tabindex="0"><code class="language-c"><span class="token keyword">static</span> __exception <span class="token keyword">int</span> <span class="token function">js_operator_in</span><span class="token punctuation">(</span>JSContext <span class="token operator">*</span>ctx<span class="token punctuation">,</span> JSValue <span class="token operator">*</span>sp<span class="token punctuation">)</span><br><span class="token punctuation">{</span><br> JSValue op1<span class="token punctuation">,</span> op2<span class="token punctuation">;</span><br> JSAtom atom<span class="token punctuation">;</span><br> <span class="token keyword">int</span> ret<span class="token punctuation">;</span><br><br> op1 <span class="token operator">=</span> sp<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br> op2 <span class="token operator">=</span> sp<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span><br><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">JS_VALUE_GET_TAG</span><span class="token punctuation">(</span>op2<span class="token punctuation">)</span> <span class="token operator">!=</span> JS_TAG_OBJECT<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token function">JS_ThrowTypeError</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> <span class="token string">"invalid 'in' operand"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br><br> <span class="token comment">// --- New code here ---</span><br> <span class="token comment">// This is the same as the previous code, but now under a conditional.</span><br> <span class="token comment">// It doesn't need to change, because after resolving the private field</span><br> <span class="token comment">// name to a symbol via `get_var_ref` the normal `JS_HasProperty` lookup</span><br> <span class="token comment">// works.</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">JS_VALUE_GET_TAG</span><span class="token punctuation">(</span>op1<span class="token punctuation">)</span> <span class="token operator">!=</span> JS_TAG_OBJECT<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> atom <span class="token operator">=</span> <span class="token function">JS_ValueToAtom</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op1<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">unlikely</span><span class="token punctuation">(</span>atom <span class="token operator">==</span> JS_ATOM_NULL<span class="token punctuation">)</span><span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> ret <span class="token operator">=</span> <span class="token function">JS_HasProperty</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op2<span class="token punctuation">,</span> atom<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token function">JS_FreeAtom</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> atom<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token comment">// New conditional branch, in case the field operand is an object.</span><br> <span class="token comment">// When a private method is referenced via `get_var_ref`, it actually</span><br> <span class="token comment">// produces the function object for that method. We then can call</span><br> <span class="token comment">// the `JS_CheckBrand` operation that is already defined to check the</span><br> <span class="token comment">// validity of a private method call.</span><br> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{</span><br> <span class="token comment">// JS_CheckBrand is modified to take a boolean (last arg) that</span><br> <span class="token comment">// determines whether to throw on failure or just indicate the</span><br> <span class="token comment">// success/fail state. This is needed as `in` doesn't throw when</span><br> <span class="token comment">// the check fails, it just returns false.</span><br> ret <span class="token operator">=</span> <span class="token function">JS_CheckBrand</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op2<span class="token punctuation">,</span> op1<span class="token punctuation">,</span> FALSE<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br> <span class="token comment">// --- New code end ---</span><br><br> <span class="token keyword">if</span> <span class="token punctuation">(</span>ret <span class="token operator"><</span> <span class="token number">0</span><span class="token punctuation">)</span><br> <span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">;</span><br> <span class="token function">JS_FreeValue</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op1<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token function">JS_FreeValue</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> op2<span class="token punctuation">)</span><span class="token punctuation">;</span><br><br> sp<span class="token punctuation">[</span><span class="token operator">-</span><span class="token number">2</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">JS_NewBool</span><span class="token punctuation">(</span>ctx<span class="token punctuation">,</span> ret<span class="token punctuation">)</span><span class="token punctuation">;</span><br><br> <span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<h1 id="testing" tabindex="-1">Testing <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h1>
<p>We can validate this implementation against the
official <a href="https://github.com/tc39/test262">test262</a> tests. QJS comes with a
test runner that can run against test262 (invoking <code>make test2</code> will run it).
Since we've added a new feature, we must also modify the tested features list
in the <a href="https://github.com/bellard/quickjs/blob/2788d71e823b522b178db3b3660ce93689534e6d/test262.conf">test262 configuration file</a>
to specify that the feature should be tested. For private brand checks, we
change <code>class-fields-private-in=skip</code> in that file to <code>class-fields-private-in</code>.</p>
<p>After changing the test file, the test262 tests for the private brand check
feature all succeed with the exception of some syntax tests due to an existing
bug with how <code>in</code> is parsed in general in QJS (the code <code>function f() { "foo" in {} = 0; }</code> should fail to parse, but errors at runtime instead in QJS).</p>
<h1 id="wrap-up" tabindex="-1">Wrap-up <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/06/12/quickjs-an-overview-and-guide-to-adding-a-new-feature/">#</a></h1>
<p>With the examples above, I've walked through what it takes to add a relatively
simple JS language feature to QuickJS. The private brand checks proposal just
adds a new use of an existing syntax, so implementing it mostly just touches
the parser and core interpreter loop. A feature that affects more of the
language, such as adding a new datatype or changing how functions are executed,
would obviously require more code and deeper changes.</p>
<p>The full changes required to implement this feature (other than test changes)
can be reviewed in <a href="https://blogs.igalia.com/compilers/code/private-brand-check.txt">this patch</a>.</p>
<p>In future posts, I'm planning to explain other parts of the QJS codebase and
potentially explore how it's being used in the WebAssembly ecosystem.</p>
<hr>
<p>Header image credit: <a href="https://www.pexels.com/photo/selective-focus-photography-of-train-610683/">https://www.pexels.com/photo/selective-focus-photography-of-train-610683/</a></p>
Compiling Bigloo Scheme to WebAssembly2023-05-10T00:00:00Zhttps://blogs.igalia.com/compilers/2023/05/10/compiling-bigloo-scheme-to-webassembly/<p>In the JavaScript world, browser implementations have focused on <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">JIT compilation</a> as a high-performance implementation technique. Recently, new applications of JS, such as on cloud compute and edge compute platforms, have driven interest in non-JIT implementations of the language. For these kinds of use cases, fast startup and predictable performance can make traditional implementation approaches appealing. An example implementation is <a href="https://bellard.org/quickjs/">QuickJS</a>, which compiles JS to a bytecode format and interprets the bytecodes. Another approach is Manuel Serrano's <a href="https://dl.acm.org/doi/abs/10.1145/3473575">work on Hopc</a>, which is a performant <a href="https://en.wikipedia.org/wiki/Ahead-of-time_compilation">AOT</a> JS compiler that uses Scheme as an intermediate language.</p>
<p>Another direction that is gaining interest is compiling <a href="https://thenewstack.io/will-javascript-become-the-most-popular-webassembly-language/">JavaScript to WebAssembly</a> (Wasm). The motivations for this approach are explained very clearly in Lin Clark's <a href="https://bytecodealliance.org/articles/making-javascript-run-fast-on-webassembly">article on making JS run fast on Wasm</a>, and some of my Igalia colleagues are spearheading this effort with the SpiderMonkey JS engine in collaboration with Bytecode Alliance partners.</p>
<p>There is still an open question of if we can apply these techniques for AOT compilation of JS to compile JS to Wasm in a practical way (though the <a href="https://github.com/bytecodealliance/componentize-js">componentize-js</a> effort appears to be building up to this using partial evaluation). One way to test this out would be to apply the previously mentioned Hopc compiler. Hopc compiles to <a href="https://www.scheme.org/">Scheme</a> which, via the <a href="https://www-sop.inria.fr/mimosa/fp/Bigloo/index.html">Bigloo Scheme</a> implementation, compiles to C. Using the standard C toolchain for Wasm (i.e., <a href="https://emscripten.org/">Emscripten</a>), we can compile that C code to Wasm.</p>
<p>To even attempt this, we would have to first make sure Bigloo Scheme's C output can be compiled to Wasm, which is the main topic of this blog post.</p>
<h1 id="bigloo-on-wasm" tabindex="-1">Bigloo on Wasm <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/05/10/compiling-bigloo-scheme-to-webassembly/">#</a></h1>
<p>In theory, it's simple to get Bigloo working with Wasm because it can emit C code, which you can compile with the C compiler of your choice. For example, you could use Emscripten's <code>emcc</code> to generate the final executable. In practice, it's more complicated than that.</p>
<p>For one, if you only compile the user Bigloo code to Wasm, it will fail to execute. The binary relies on several libraries that make up the runtime system, which themselves have to be compiled to Wasm in order to link a final executable.</p>
<p>The diagram below illustrates the compilation pipeline. The purple boxes at the lower right are the runtime libraries that need to be linked in.</p>
<p><picture><source type="image/avif" srcset="https://blogs.igalia.com/compilers/img/CZwKRIXbiR-1116.avif 1116w"><source type="image/webp" srcset="https://blogs.igalia.com/compilers/img/CZwKRIXbiR-1116.webp 1116w"><img alt="Diagram illustrating the steps in the compilation pipeline from Hopc to Wasm" loading="lazy" decoding="async" src="https://blogs.igalia.com/compilers/img/CZwKRIXbiR-1116.png" width="1116" height="506"></picture></p>
<p>As a result, we will need to build Bigloo twice: once natively and once to Wasm. The latter build to Wasm will create the needed runtime libraries. This approach is also suggested in the <a href="https://emscripten.org/docs/compiling/Building-Projects.html#build-system-self-execution">Emscripten documentation</a> for building projects that use self-execution.</p>
<p>I've scripted this approach in a <a href="https://github.com/takikawa/bigloo-wasm-dockerfile/blob/main/Dockerfile">Dockerfile</a> that contains a reproducible setup for reliably compiling Bigloo to Wasm. You can see that starting at <a href="https://github.com/takikawa/bigloo-wasm-dockerfile/blob/5138f818501540f79b384064313d1bb436281387/Dockerfile#L21">line 21</a> an ordinary native Bigloo is built, with a number of features disabled that won't work well in Wasm. Starting at <a href="https://github.com/takikawa/bigloo-wasm-dockerfile/blob/5138f818501540f79b384064313d1bb436281387/Dockerfile#L42">line 42</a> a very similar build is done using the <code>emconfigure</code> wrapper that handles the <code>configure</code> script process for Emscripten. The options passed mirror the native build, but with some extra options needed for Wasm.</p>
<p>Like many projects that use Emscripten, some modifications are needed to get Bigloo to compile properly. For example, making <a href="https://gist.github.com/takikawa/a6fd03fd351f46af791844711a672cf3">C types more precise</a>, <a href="https://gist.github.com/takikawa/e3bdb81eb987b26d7584d3f4e885d5ed">backporting Emscripten compatibility</a> patches for included libraries, and <a href="https://gist.github.com/takikawa/1316c7dbfe6a7b3b15e92d17521f0781">adjusting autoconf tests</a> to return a desired result with Emscripten.</p>
<h1 id="1-1-4" tabindex="-1">1 + 1 = 4? <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/05/10/compiling-bigloo-scheme-to-webassembly/">#</a></h1>
<p>There are some tricky details that you need to get right to have working Wasm programs in the end. For example, when I first got a working docker environment to run Bigloo-on-Wasm programs, I got the following result:</p>
<pre class="language-shell-session" tabindex="0"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash"><span class="token function">cat</span> num.scm <span class="token comment"># this is a Bigloo scheme module</span></span></span><br><span class="token output">(module num)<br>(display (+ 1 1)) (newline)<br></span><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">/opt/bigloo/bin/bigloo <span class="token parameter variable">-O3</span> num.scm <span class="token parameter variable">-o</span> num.js <span class="token parameter variable">-cc</span> /emsdk/upstream/emscripten/emcc <span class="token comment"># compile to wasm, more arguments are needed in practice, this is a simplified example</span></span></span><br><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">emsdk/node/14.18.2_64bit/bin/node num.js <span class="token comment"># execute the compiled wasm in nodejs</span></span></span><br><span class="token output">4</span></code></pre>
<p>(Side note: if you haven't used a Wasm toolchain before, you may be confused why the output is <code>num.js</code>. Wasm toolchains often produce JS glue code that you use to load the actual Wasm code in a browser/JS engine.)</p>
<p>The Scheme program <code>num.scm</code> is supposed to print the result of "1 + 1". The wasm binary helpfully prints... 4. Other programs that I tried, like printing "hello world", resulted in the IO system trying to print random parts of Wasm's linear memory.</p>
<p>The proximal reason for this failure was that the value tagging code in the Bigloo runtime was being configured incorrectly. If you look at the Bigloo tagging code, you see these cpp definitions:</p>
<pre class="language-c" tabindex="0"><code class="language-c"><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name">TAG_SHIFT</span> <span class="token expression">PTR_ALIGNMENT</span></span><br><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name">TAG_MASK</span> <span class="token expression"><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token number">1</span> <span class="token operator"><<</span> PTR_ALIGNMENT<span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">)</span></span></span><br><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name">TAG_MASKOBJECT</span> <span class="token expression">TAG_MASK</span></span><br><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name">TAG_MASKPOINTER</span> <span class="token expression">TAG_MASK</span></span><br><br><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name function">TAG</span><span class="token expression"><span class="token punctuation">(</span>_v<span class="token punctuation">,</span> shift<span class="token punctuation">,</span> tag<span class="token punctuation">)</span> </span><span class="token punctuation">\</span><br> <span class="token expression"><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token keyword">long</span><span class="token punctuation">)</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token keyword">unsigned</span> <span class="token keyword">long</span><span class="token punctuation">)</span><span class="token punctuation">(</span>_v<span class="token punctuation">)</span> <span class="token operator"><<</span> shift<span class="token punctuation">)</span> <span class="token operator">|</span> tag<span class="token punctuation">)</span><span class="token punctuation">)</span></span></span><br><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name function">UNTAG</span><span class="token expression"><span class="token punctuation">(</span>_v<span class="token punctuation">,</span> shift<span class="token punctuation">,</span> tag<span class="token punctuation">)</span> </span><span class="token punctuation">\</span><br> <span class="token expression"><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token keyword">long</span><span class="token punctuation">)</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token keyword">long</span><span class="token punctuation">)</span><span class="token punctuation">(</span>_v<span class="token punctuation">)</span> <span class="token operator">>></span> shift<span class="token punctuation">)</span><span class="token punctuation">)</span></span></span><br><br><span class="token comment">/* ... */</span><br><br><span class="token macro property"><span class="token directive-hash">#</span> <span class="token directive keyword">define</span> <span class="token macro-name">TAG_INT</span> <span class="token expression"><span class="token number">0</span> </span><span class="token comment">/* integer tagging ....00 */</span></span></code></pre>
<p>The <code>TAG</code> operation is used throughout compiled Bigloo code to tag values into the <a href="https://medium.com/@samth/on-typed-untyped-and-uni-typed-languages-8a3b4bedf68c">unityped</a> Scheme value representation. The default tagging scheme (see <code>TAG_SHIFT</code>) is a <a href="https://en.wikipedia.org/w/index.php?title=Tagged_pointer&oldid=1063919013#Folding_tags_into_the_pointer">typical one</a> that depends on the pointer alignment, which depends on the word size (4 bytes on 32-bit, 8 bytes on 64-bit). The <code>PTR_ALIGNMENT</code> definition is defined to be the log base 2 of the word size. This means 2 bits of the value are used for a tag on 32-bit platforms and 3 bits are used on 64-bit platforms.</p>
<p>In the case of numbers, the tag is <code>0</code> (<code>TAG_INT</code> above) so a discrepancy in tagging will produce a mis-shifted number value. That's exactly why the <code>num.js</code> program printed <code>4</code> above. It's the right answer <code>2</code> shifted by one bit.</p>
<p>The reason for that shift is that I was compiling native Bigloo in a 64-bit configuration since that's the architecture of the host machine. Wasm, however, is specified to have a 32-bit address space (unless the <a href="https://github.com/webAssembly/memory64">memory64 proposal</a> is used). This discrepancy caused values to get shifted with 2 bits in some places, and 3 bits in others during tagging/untagging. After figuring this out, it was relatively easy to compile Bigloo with an i686 toolchain.</p>
<h1 id="function-pointers" tabindex="-1">Function pointers <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/05/10/compiling-bigloo-scheme-to-webassembly/">#</a></h1>
<p>After fixing 32-bit/64-bit discrepancies, simple Bigloo programs would run in a Wasm engine. On more complex examples, however, I was running into function pointer cast errors like the following:</p>
<pre class="language-shell-session" tabindex="0"><code class="language-shell-session"><span class="token command"><span class="token shell-symbol important">$</span> <span class="token bash language-bash">emsdk/node/15.14.0_64bit/bin/node bigloo-compiled-program.js</span></span><br><span class="token output">RuntimeError: function signature mismatch<br> at <anonymous>:wasm-function[184]:0x3b430<br> at <anonymous>:wasm-function[1397]:0x3400bb<br> at <anonymous>:wasm-function[505]:0x118046<br> at <anonymous>:wasm-function[325]:0xd693e<br> at <anonymous>:wasm-function[4224]:0x71dd82<br> at Ya (<anonymous>:wasm-function[18267]:0x143d987)<br> at ret.<computed> (/test-output.js:1:112711)<br> at Object.doRewind (/test-output.js:1:114339)<br> at /test-output.js:1:114922<br> at /test-output.js:1:99074</span></code></pre>
<p>This is a <a href="https://emscripten.org/docs/porting/guidelines/function_pointer_issues.html">documented issue</a> that comes up when porting systems to Emscripten. It's not Emscripten's fault, because oftentimes the programs are relying on undefined behavior (UB) in C.</p>
<p>In particular, <a href="https://en.cppreference.com/w/c/language/cast">CPPReference says the following</a> about function pointer casts:</p>
<blockquote>
<p>Any pointer to function can be cast to a pointer to any other function type. If the resulting pointer is converted back to the original type, it compares equal to the original value. If the converted pointer is used to make a function call, the behavior is undefined (unless the function types are compatible)</p>
</blockquote>
<p>which means that generally function pointer casts are undefined unless the source and target types are compatible. Bigloo has many cases where function pointers need to be cast. For example, the representation of Scheme procedures contains <a href="https://github.com/manuel-serrano/bigloo/blob/9c66c638b38245538eaa9c092300de4c66f65179/runtime/Include/bigloo.h#L555">a field</a> for a C function pointer:</p>
<pre class="language-c" tabindex="0"><code class="language-c"> <span class="token comment">/* procedure (closures) */</span><br> <span class="token keyword">struct</span> <span class="token class-name">procedure</span> <span class="token punctuation">{</span><br> <span class="token class-name">header_t</span> header<span class="token punctuation">;</span> <br> <span class="token keyword">union</span> scmobj <span class="token operator">*</span><span class="token punctuation">(</span><span class="token operator">*</span>entry<span class="token punctuation">)</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// <-- function pointer for the procedure entrypoint</span><br> <span class="token keyword">union</span> scmobj <span class="token operator">*</span><span class="token punctuation">(</span><span class="token operator">*</span>va_entry<span class="token punctuation">)</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">union</span> scmobj <span class="token operator">*</span>attr<span class="token punctuation">;</span><br> <span class="token keyword">int</span> arity<span class="token punctuation">;</span><br> <span class="token keyword">union</span> scmobj <span class="token operator">*</span>obj0<span class="token punctuation">;</span><br> <span class="token punctuation">}</span> procedure<span class="token punctuation">;</span></code></pre>
<p>The function pointer's C type cannot precisely capture the actual behavior even with a uniform value representation, as the arity of the Scheme procedure needs to be represented. C does not prevent you from calling the function with whatever arity you like though, as you can see in the <a href="https://github.com/manuel-serrano/bigloo/blob/9c66c638b38245538eaa9c092300de4c66f65179/api/gstreamer/src/Clib/bglgst.c#L435">Gstreamer API code</a>:</p>
<pre class="language-c" tabindex="0"><code class="language-c"><span class="token class-name">obj_t</span> proc <span class="token operator">=</span> cb<span class="token operator">-></span>proc<span class="token punctuation">;</span> <span class="token comment">// A Scheme procedure object</span><br><span class="token keyword">switch</span><span class="token punctuation">(</span> cb<span class="token operator">-></span>arity <span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">case</span> <span class="token number">0</span><span class="token operator">:</span><br> <span class="token function">PROCEDURE_ENTRY</span><span class="token punctuation">(</span> proc <span class="token punctuation">)</span> <span class="token punctuation">(</span> proc<span class="token punctuation">,</span> BEOA <span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// Extract the function entry pointer and call it</span><br> <span class="token keyword">break</span><span class="token punctuation">;</span><br> <br> <span class="token keyword">case</span> <span class="token number">1</span><span class="token operator">:</span><br> <span class="token function">PROCEDURE_ENTRY</span><span class="token punctuation">(</span> proc <span class="token punctuation">)</span> <span class="token punctuation">(</span> proc<span class="token punctuation">,</span> <span class="token function">convert</span><span class="token punctuation">(</span> cb<span class="token operator">-></span>args<span class="token punctuation">[</span> <span class="token number">0</span> <span class="token punctuation">]</span><span class="token punctuation">,</span> BTRUE <span class="token punctuation">)</span><span class="token punctuation">,</span> BEOA <span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">break</span><span class="token punctuation">;</span><br><br> <span class="token comment">/* ... */</span><br><span class="token punctuation">}</span></code></pre>
<p>In practice, this kind of UB shouldn't cause problems for a typical C compiler because its output (assembly/machine code) is untyped. What matters is whether the calling convention is followed (it should be fine in Bigloo since these functions uniformly take and return <code>scmobj*</code> pointers).</p>
<p>Since Wasm has a sound static type system, it doesn't allow such loose typing of functions and will crash with a runtime type check if the types do not match. It's possible to work around this by using the <code>EMULATE_FUNCTION_POINTER_CASTS</code> Emscripten option to generate stubs that emulate the cast, but it adds significant overheads as the <a href="https://emscripten.org/docs/porting/guidelines/function_pointer_issues.html#working-around-function-pointer-issues">Emscripten docs</a> note (emphasis mine):</p>
<blockquote>
<p>Use EMULATE_FUNCTION_POINTER_CASTS. When you build with -sEMULATE_FUNCTION_POINTER_CASTS, Emscripten emits code to emulate function pointer casts at runtime, adding extra arguments/dropping them/changing their type/adding or dropping a return type/etc. <em>This can add significant runtime overhead, so it is not recommended, but is be worth trying.</em></p>
</blockquote>
<p>The overhead is clear in the generated code, because the option adds dummy function parameters to virtualize calls. Here's an example showing the decompiled Wasm code with emulated casts:</p>
<p><picture><source type="image/avif" srcset="https://blogs.igalia.com/compilers/img/JlLNJQE1pa-1115.avif 1115w"><source type="image/webp" srcset="https://blogs.igalia.com/compilers/img/JlLNJQE1pa-1115.webp 1115w"><img alt="Screenshot of decompiled Wasm showing a large number of function parameters" loading="lazy" decoding="async" src="https://blogs.igalia.com/compilers/img/JlLNJQE1pa-1115.png" width="1115" height="767"></picture></p>
<p>You can see that the function being called with <code>call_indirect</code> has a huge number of arguments (the highlighted <code>(param i64 ...)</code> shows the type of the function being called, and the <code>(i64.const 0) ...</code> above the call are the concrete arguments). There are more than 70 arguments here, and most of them are unused and are present only for the virtualization of the call. This can add up to a huge binary size cost, since there can also be a large number of functions in the Wasm module's indirect function table:</p>
<p><picture><source type="image/avif" srcset="https://blogs.igalia.com/compilers/img/lX1k0kpkhF-670.avif 670w"><source type="image/webp" srcset="https://blogs.igalia.com/compilers/img/lX1k0kpkhF-670.webp 670w"><img alt="Screenshot of the V8 debugger showing a Wasm module with more than 20,000 entries in its function table" loading="lazy" decoding="async" src="https://blogs.igalia.com/compilers/img/lX1k0kpkhF-670.png" width="670" height="719"></picture></p>
<p>The screenshot from the V8 debugger above is showing the contents of the running module. In this case the module's table (highlighted in red) has over 20,000 function entries. Calls to many of these will incur the emulation overhead. It's not clear to me that there is any good way to avoid this cost without significantly changing the representation of values in Bigloo.</p>
<h1 id="what-about-hopc" tabindex="-1">What about Hopc? <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/05/10/compiling-bigloo-scheme-to-webassembly/">#</a></h1>
<p>After getting Bigloo to compile to Wasm, I did go back to the initial motivation of this blog post and tried to get Hopc (the JS to Scheme compiler) working in order to have a whole pipeline to compile JS to Wasm. While I was able to get a working build, I had some trouble producing a final Wasm program that could serve as a demo without crashing. At some point, some of the runtime initialization code hits a <code>call_indirect</code> on a null function pointer and crashes.</p>
<p>I suspect that even if I could resolve the crashes, there would be more work needed to make this practical for the use cases I described at the beginning. The best code size I've been able to get for a minimal JS program compiled to Wasm using this pipeline was 29MB, which is rather large. For comparison, Guy Bedford quoted in the <a href="https://thenewstack.io/will-javascript-become-the-most-popular-webassembly-language/">JS to Wasm article</a> linked earlier suggested 5-6MB was a reasonable number for a Spidermonkey embedding.</p>
<p>There may be opportunities to reduce this overhead. For example, disabling asyncify and function pointer cast emulation reduces the binary to 9.8MB, albeit a non-working one. <a href="https://emscripten.org/docs/porting/asyncify.html">Asyncify</a> appears to be required to use the default <a href="https://github.com/ivmai/bdwgc">BDW garbage collector</a> due to the use of <code>emscripten_scan_registers()</code>. There is <a href="https://github.com/emscripten-core/emscripten/issues/18251">some discussion</a> of possibly making the asyncify use optional (and possibly using <a href="https://github.com/WebAssembly/binaryen">Binaryen</a>'s "spill pointers" pass), but it looks like this will take more time to materialize. To avoid the asyncify overhead at the Bigloo level, it could be interesting to look into alternative GC configurations that don't use BDW at all. For the function pointer issue, maybe future changes that leverage the <a href="https://github.com/WebAssembly/gc">Wasm GC proposal</a> (which has a <code>ref.cast</code> instruction that can cast a function reference to a more precise type) could provide a workaround.</p>
<h1 id="wrap-up" tabindex="-1">Wrap-up <a class="header-anchor" href="https://blogs.igalia.com/compilers/2023/05/10/compiling-bigloo-scheme-to-webassembly/">#</a></h1>
<p>It was fun to explore this possibility for AOT compiling JS to Wasm, and more generally it was a good exercise in porting a programming language to Wasm. While there were some tricky problems, the Emscripten tools were good at handling many parts of the process automatically.</p>
<p>I also had to debug a bunch of crashing Wasm code too, and found that the debug support was better than I expected. Passing the debug mode flag <code>-g</code> to <code>emcc</code> helped in getting useful stack traces and in utilizing the Chrome debugger. Though I did wish I had access to a <a href="https://github.com/rr-debugger/rr">rr-style</a> time travel debugger to continue backwards from a crash site.</p>
<p>With regard to Hopc, I think it could be worth exploring further if the runtime crashes in Wasm can be resolved and if the binary size could be brought down using some of the approaches I suggested above. For the time being though, if you wanted to compile Scheme to Wasm you have an option available now with Bigloo. The Bigloo setup can compile some non-trivial Scheme programs too, such as this demo page that uses Olin Shivers' <a href="https://www.ccs.neu.edu/home/shivers/mazes.html">maze program</a> compiled to Wasm: <a href="https://people.igalia.com/atakikawa/wasm/bigloo/maze.html">https://people.igalia.com/atakikawa/wasm/bigloo/maze.html</a></p>
<p>For another path to use Scheme on Wasm, also check out my colleagues' work to <a href="https://wingolog.org/archives/2023/03/20/a-world-to-win-webassembly-for-the-rest-of-us">compile Guile to Wasm</a>.</p>
<hr>
<p>Header image credit: <a href="https://www.pexels.com/photo/close-up-photo-of-codes-1089440/">https://www.pexels.com/photo/close-up-photo-of-codes-1089440/</a></p>
Igalia's Compilers Team in 2022H12022-08-04T00:00:00Zhttps://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/<p>As we enter the second half of 2022, we’d like to provide a summary (necessarily highly condensed and selective!) of what we’ve been up to recently, providing some insight into the breadth of technical challenges our team of over 20 compiler engineers has been tackling.</p>
<h1 id="low-level-js-jsc-on-32-bit-systems" tabindex="-1">Low-level JS / JSC on 32-bit systems <a class="header-anchor" href="https://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/">#</a></h1>
<p>We have continued to maintain support for 32-bit systems (mainly ARMv7, but also MIPS) in JavaScriptCore (JSC). The work involves continuous tracking of upstream development to prevent regressions as well as the development of new features:</p>
<ul>
<li>A major milestone for this has been the completion of support for WebAssembly in the Low-Level Interpreter (LLInt) for ARMv7. The MIPS support is mostly complete.</li>
<li>Developed an initial prototype of the concurrency compilation in the DFG tier for 32-bit systems, and the results are promising. The work continues, and we expect to upstream it in 2022H2.</li>
<li>Code reduction and optimizations: we upstreamed several code reductions and optimizations for 32-bit systems (mainly ARMv7): 25% size reduction in DFGOSRExit blocks, 24% in baseline JIT on JetStream2 and 25% code size reduction from porting EXTRA_CTI_THUNKS.</li>
<li>Improved our hardware testing infrastructure with more MIPS and faster ARMv7 hardware for the buildbots running in the EWS (Early Warning System), which allows for a smaller response time for regressions.</li>
<li>Deployed two fuzzing bots that run test JSC 24/7. The bots already found a few issues upstream that we reported to Apple. The bugs that affect 64-bit systems were fixed by the team at Apple, while we are responsible for fixing the ones affecting 32-bit systems. We expect to work on them in 2022H2.</li>
<li>Added logic to transparently re-run failing JSC tests (on 32-bit platforms) and declare them a pass if they’re simply flaky, as long as the flakiness does not rise above a threshold. This means fewer false alerts for developers submitting patches to the EWS and for the people doing QA work. Naturally, the flakiness information is stored in the WebKit resultsdb and visualized at <a href="https://results.webkit.org/">results.webkit.org</a>.</li>
</ul>
<h1 id="js-and-standards" tabindex="-1">JS and standards <a class="header-anchor" href="https://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/">#</a></h1>
<p>Another aspect of our work is our contribution to the JavaScript standards effort, through involvement in the TC39 standards body, direct contribution to standards proposals, and implementation of those proposals in the major JS engines.</p>
<ul>
<li>Further coverage of the <a href="https://tc39.es/proposal-temporal/docs/">Temporal</a> spec in the <a href="https://github.com/tc39/test262">Test262</a> conformance suite, as well as various specification updates. See <a href="https://ptomato.wordpress.com/2022/03/03/comparing-apples-and-appleoranges/">this blog post</a> for insight into some of the challenges tackled by Temporal.</li>
<li>Performance improvements for JS class features in V8, such as <a href="https://joyeecheung.github.io/blog/2022/04/14/fixing-snapshot-support-of-class-fields-in-v8/">faster initialisations</a>.</li>
<li>Work towards supporting snapshots in node.js, including activities such as <a href="https://joyeecheung.github.io/blog/2022/04/14/fixing-snapshot-support-of-class-fields-in-v8/">fixing</a> support for V8 startup snapshots in the presence of class field initializers.</li>
<li>Collaborating with others on the “<a href="https://github.com/tc39/proposal-type-annotations">types as comments</a>” proposal for JS, successfully reaching stage 1 in the <a href="https://tc39.es/process-document/">TC39 process</a>.</li>
<li>Implementing <a href="https://jgriego.net/posts/2022-03-28-shadowrealms-in-webkit.html">ShadowRealm support in WebKit</a>.</li>
</ul>
<h1 id="webassembly" tabindex="-1">WebAssembly <a class="header-anchor" href="https://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/">#</a></h1>
<p>WebAssembly is a low-level compilation target for the web, which we have contributed to in terms of specification proposals, LLVM toolchain modifications, implementation work in the JS engines, and working with customers on use cases both on the server and in web browsers.</p>
<p>Some highlights from the last 6 months include:</p>
<ul>
<li>Creation of a <a href="https://github.com/WebAssembly/stringref">proposal</a> for reference-typed strings in WebAssembly to ensure efficient operability with languages like JavaScript. We also landed <a href="https://bugs.chromium.org/p/v8/issues/detail?id=12868">patches to implement this proposal in V8</a>.</li>
<li><a href="https://github.com/wingo/wasm-jit">Prototyping</a> Just-In-Time (JIT) compilation within WebAssembly.</li>
<li>Working to implement support for WebAssembly <a href="https://github.com/WebAssembly/gc">GC types</a> in Clang and LLVM (with one important <a href="https://github.com/Igalia/ref-cpp">use case</a> being efficient and leak-free sharing of object graphs between JS and C++ compiled to Wasm).</li>
<li>Implementation of support for GC types in WebKit’s implementation of WebAssembly.</li>
</ul>
<h1 id="events" tabindex="-1">Events <a class="header-anchor" href="https://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/">#</a></h1>
<p>With in-person meetups becoming possible again, Igalians have been talking on a range of topics - Multi-core Javascript (BeJS), TC39 (JS Nation), RISC-V LLVM (Cambridge RISC-V meetup), and more.</p>
<p>We’ve also had opportunities for much-needed face to face time within the team, with many of the compilers team meeting in Brussels in May, and for the company-wide summit held in A Coruña in June. These events provided a great opportunity to discuss current technical challenges, strategy, and ideas for the future, knowledge sharing, and of course socialising.</p>
<h1 id="team-growth" tabindex="-1">Team growth <a class="header-anchor" href="https://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/">#</a></h1>
<p>Our team has grown further this year, being joined by:</p>
<ul>
<li>Nicolò Ribaudo - a core maintainer of <a href="https://babeljs.io/">BabelJS</a>, continuing work on that project after joining Igalia in June as well as contributing to work on JS modules.</li>
<li>Aditi Singh - previously worked with the team through the <a href="https://www.igalia.com/coding-experience/">Coding Experience program</a>, joining full time in March focusing on the Temporal project.</li>
<li>Alex Bradbury - a long-time LLVM developer who joined in March and is focusing on WebAssembly and RISC-V work in Clang/LLVM.</li>
</ul>
<p>We’re keen to continue to grow the team and actively hiring, so if you think you might be interested in working in any of the areas discussed above, please <a href="https://www.igalia.com/jobs/javascript_engine_developer">apply here</a>.</p>
<h1 id="more-about-igalia" tabindex="-1">More about Igalia <a class="header-anchor" href="https://blogs.igalia.com/compilers/2022/08/04/igalias-compilers-team-in-2022h1/">#</a></h1>
<p>If you're keen to learn more about how we work at Igalia, a<a href="https://thenewstack.io/igalia-the-open-source-powerhouse-youve-never-heard-of/"> recent article at The New Stack</a> provides a fantastic overview and includes comments from a number of customers who have supported the work described in this post.</p>
Recent talks at GUADEC and NodeConf2021-11-16T00:00:00Zhttps://blogs.igalia.com/compilers/2021/11/16/recent-talks-at-guadec-and-nodeconf/<p>Over the summer and now going into autumn, Igalia compilers team members have been presenting talks at various venues about JavaScript and web engines. Today we'd like to share with you two of those talks that you can watch online.</p>
<p>First, Philip Chimento gave a talk titled "What's new with JavaScript in GNOME: The 2021 edition" at <a href="https://events.gnome.org/event/9/">GUADEC 2021</a> about GNOME's integrated JavaScript engine <a href="https://gjs.guide/">GJS</a>. This is part of a series of talks about JavaScript in GNOME that Philip has been giving at GUADEC for a number of years.</p>
<p>You can watch it on Youtube <a href="https://www.youtube.com/watch?v=xHqkiSd1hQQ&t=20669s">here</a> and the slides for the talk are available <a href="https://ptomato.name/talks/guadec2021/">here</a>.</p>
<p>https://www.youtube.com/watch?v=xHqkiSd1hQQ&t=20669s</p>
<p>Second, Romulo Cintra gave a talk at <a href="https://www.nodeconfremote.com/">NodeConf Remote 2021</a> titled "IPFS - InterPlanetary File System with Node.js". In this talk, Romulo introduces IPFS: a new distributed file system protocol for sharing files and media in a peer-to-peer fashion. Romulo also talks about some of the efforts to bring this to the web (<a href="https://arewedistributedyet.com">https://arewedistributedyet.com/</a>) and goes over how IPFS can be used with Node.js.</p>
<p>You can watch Romulo's talk on YouTube as well by going <a href="https://www.youtube.com/watch?v=ctFadWFCb2g">here</a>.</p>
<p>https://www.youtube.com/watch?v=ctFadWFCb2g</p>
<p>The slides for the talk are available <a href="https://ipfs.io/ipfs/QmQCZaHJBZVFncftY8YGsS3BEbgA9Pu6B3JT4gdE7EhELD">here</a> or you can even use IPFS to download it: ipfs://QmQCZaHJBZVFncftY8YGsS3BEbgA9Pu6B3JT4gdE7EhELD</p>
JS Nation Talk: “How to Outsmart Time: Building Futuristic JavaScript Apps Using Temporal”2021-07-06T00:00:00Zhttps://blogs.igalia.com/compilers/2021/07/06/js-nation-talk-how-to-outsmart-time-building-futuristic-javascript-apps-using-temporal/<p>Recently Compilers Team member Ujjwal Sharma gave a talk at the <a href="https://live.jsnation.com/">JS Nation 2021</a> conference about the <a href="https://github.com/tc39/proposal-temporal/">Temporal</a> proposal. Check out the recording here:</p>
<p><a href="https://portal.gitnation.org/contents/how-to-outsmart-time-building-futuristic-javascript-apps-using-temporal">https://portal.gitnation.org/contents/how-to-outsmart-time-building-futuristic-javascript-apps-using-temporal</a></p>
<p>The talk goes over how to use Temporal's new date & time API in real world programming, and also how Temporal interacts with other APIs such as JS Intl.</p>
<p>We've written about Temporal <a href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/">previously on this blog</a> and our other teammates have also <a href="https://ptomato.wordpress.com/2020/07/08/the-surrealist-clock-of-javascript/">written about</a> how Temporal might be useful for the GNOME desktop.</p>
<p>If you're interested in an audio-format deep-dive about Temporal, also check out the <a href="https://www.igalia.com/chats/Temporal">Igalia Chats episode</a> on the topic.</p>
Igalia's Compilers Team in 20202021-03-09T00:00:00Zhttps://blogs.igalia.com/compilers/2021/03/09/igalias-compilers-team-in-2020/<p>In a <a href="https://blogs.igalia.com/compilers/2020/06/05/what-we-do-at-igalias-compiler-team/">previous blog post</a>, we introduced the kind of work the Igalia compilers team does and gave a mid-year update on our 2020 progress.</p>
<p>Now that we have made our way into 2021, we wanted to recap our achievements from 2020 and update you on the exciting improvements we have been making to the web programming platform. Of course, we couldn't have done this work alone; all of this was brought to you through our collaborations with our clients and upstream partners in the web ecosystem.</p>
<p>https://www.twitter.com/rkirsling/status/1276299298020290561</p>
<h1 id="javascript-class-features" tabindex="-1">JavaScript class features <a class="header-anchor" href="https://blogs.igalia.com/compilers/2021/03/09/igalias-compilers-team-in-2020/">#</a></h1>
<p>Our engineers at Igalia have continued to push forward on improvements to JS <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes">classes</a>. Earlier in 2020, we had landed support for public field declarations in JavaScriptCore (JSC). In the latter half of 2020, we achieved major milestones such as getting <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Classes/Private_class_fields">private class fields</a> into JSC (with optimizing compiler support!):</p>
<p>https://www.twitter.com/robpalmer2/status/1276378092349657095</p>
<p>https://www.twitter.com/caitp88/status/1318919341467979776</p>
<p>as well as static public and private fields.</p>
<p>We also helped ship <a href="https://v8.dev/blog/v8-release-84">private methods and accessors in V8</a> version 84. Our work on private methods also landed in JSC and we expect it to be available in future releases of Safari.</p>
<p>These additions will help JS developers create better abstractions by encapsulating state and behavior in their classes.</p>
<h1 id="tc39-and-temporal" tabindex="-1">TC39 and Temporal <a class="header-anchor" href="https://blogs.igalia.com/compilers/2021/03/09/igalias-compilers-team-in-2020/">#</a></h1>
<p>Our compilers team also contributed throughout 2020 to web standards through its participation in TC39 and related standards bodies.</p>
<p>One of the big areas we have been working on is the <a href="https://github.com/tc39/proposal-temporal/">Temporal</a> proposal, which aims to provide better date and time handling in JS. When we <a href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/">blogged</a> about this in mid-2020, the proposal was still in Stage 2 but we're expecting it to go Stage 3 soon in 2021. Igalians have been working hard on many aspects of the proposal since mid-2020, including managing community feedback, working on the polyfill, and maintaining the documentation.</p>
<p>For more info on Temporal, also check out <a href="https://www.youtube.com/watch?v=3F2A708c1o0">a talk</a> by one of engineers, Ujjwal Sharma, at <a href="https://holyjs-piter.ru/en/">Holy JS 2020 Piter</a>.</p>
<p>https://twitter.com/Jason_williams/status/1276112646568382464</p>
<p>Another area we have been contributing to for a number of years is the <a href="https://github.com/tc39/ecma402">ECMA-402 Internationalization</a> (Intl) standard, an important effort that provides <a href="https://en.wikipedia.org/wiki/Internationalization_and_localization">i18n</a> support for JS. We help maintain and edit the specification while also contributing tests and pushing Intl proposals forward. For example, we helped with the test suite of the <a href="https://www.chromestatus.com/feature/6099397733515264"><code>Intl.Segmenter</code></a> feature for implementing localized <a href="https://en.wikipedia.org/wiki/Text_segmentation">text segmentation</a>, which recently shipped in Chrome. For a good overview of other recent Intl efforts, check out <a href="https://docs.google.com/presentation/d/1nEnkIu4BpS9S-_K4WR-glfgB9sbjWEP9DeUXuKnrkkQ/edit#slide=id.p">these slides</a> from <a href="https://events.omg.org/iuc44/">IUC44</a>.</p>
<p>We're also contributing to many other proposed features for JS, such as <a href="https://tc39.es/proposal-weakrefs/">WeakRefs</a>, <a href="https://github.com/tc39/proposal-decimal">Decimal</a> (Daniel Ehrenberg from our team gave a <a href="https://www.youtube.com/watch?v=G3Q4vWf8Peo">talk</a> on this at <a href="https://www.nodetlv.com/2020">Node.TLV 2020</a>), <a href="https://github.com/tc39/proposal-import-assertions">Import Assertions</a>, <a href="https://github.com/tc39/proposal-record-tuple">Records & Tuples</a>, <a href="https://github.com/tc39/proposal-top-level-await">Top-level await</a>, and <a href="https://github.com/tc39/proposal-js-module-blocks">Module blocks</a> & <a href="https://github.com/littledan/proposal-module-fragments/">module bundling</a> (Daniel also gave a <a href="https://www.youtube.com/watch?v=OFUanbq_8Xw">talk</a> on these topics at <a href="https://holyjs-moscow.ru/en/">Holy JS 2020 Moscow</a>).</p>
<h1 id="node-js" tabindex="-1">Node.js <a class="header-anchor" href="https://blogs.igalia.com/compilers/2021/03/09/igalias-compilers-team-in-2020/">#</a></h1>
<p>In addition to our contributions to the client side of the web, we are also contributing to server side use of web engines. In particular, we have continued to contribute to Node.js throughout 2020.</p>
<p>Some notable contributions include adding experimental support for per-context memory measurements in <a href="https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V13.md">version 13</a> during early 2020.</p>
<p>Since late 2020, we have been working on improving Node.js startup speed by <a href="https://github.com/nodejs/node/issues/35711">moving more of the bootstrap process into the startup snapshot</a>. For more on this topic, you can watch a talk that one of our engineers, Joyee Cheung, presented at NodeConf Remote 2020 <a href="https://www.youtube.com/watch?v=G36lrPrF09c">here</a> (slides are available <a href="https://github.com/joyeecheung/talks/blob/master/nodeconf_remote_202011/node-startup-performance.pdf">here</a>).</p>
<p>https://twitter.com/mhdawson1/status/1323608062419230721</p>
<h1 id="jsc-support-on-32-bit-platforms" tabindex="-1">JSC support on 32-bit platforms <a class="header-anchor" href="https://blogs.igalia.com/compilers/2021/03/09/igalias-compilers-team-in-2020/">#</a></h1>
<p>Our group also continues to maintain support in JSC for 32-bit platforms. Earlier in 2020 we contributed improvements to JSC on 32-bit such as tail call optimizations, support for checkpoints, and others.</p>
<p>Since then we have been optimizing LLInt (the low-level interpreter for JSC) on 32-bit, and porting the support of inline caching for delete operations to 32-bit (to improve the performance of delete, you can read about the background on the original optimization from the Webkit blog <a href="https://webkit.org/blog/10298/inline-caching-delete/">here</a>).</p>
<p>We also <a href="https://linki.tools/2020/11/a-tour-of-the-for-of-implementation-for-32bits-jsc.html">blogged about</a> our efforts to support the <code>for-of</code> intrinsic on 32-bit to improve iteration on JS arrays.</p>
<p>https://twitter.com/pocmatos/status/1329814124423995396</p>
<h1 id="webassembly" tabindex="-1">WebAssembly <a class="header-anchor" href="https://blogs.igalia.com/compilers/2021/03/09/igalias-compilers-team-in-2020/">#</a></h1>
<p>Finally, we have made a number of contributions to <a href="https://webassembly.org/">WebAssembly</a> (Wasm), the new low-level compiler-target language for the web, on both the specification and implementation sides.</p>
<p>During 2020, we helped ship and standardize several Wasm features in web engines such as support for <a href="https://wingolog.org/archives/2020/04/03/multi-value-webassembly-in-firefox-from-1-to-n">multiple-values</a>, which can help compilers to Wasm produce better code, and support for <a href="https://www.asumu.xyz/blog/2020/07/06/shipping-webassembly-s-bigint-i64-conversion-in-firefox/">BigInt/I64 conversion</a> in the JS API, which lifts a restriction that made it harder to interact with Wasm programs from JS.</p>
<p>https://twitter.com/SpiderMonkeyJS/status/1247844182837866497</p>
<p>We've also improved support in tools such as LLVM for the <a href="https://github.com/WebAssembly/reference-types/">reference types</a> proposal, which adds new types to the language that can represent references to values from JS or other host languages. Eventually reference types will be key to supporting the <a href="https://github.com/WebAssembly/gc/">garbage collection</a> proposal (in which references are extended to new struct and array types), which will allow for easier compilation of languages that use GC to Wasm.</p>
<p>https://twitter.com/pocmatos/status/1316642040906743808</p>
<p>We're also actively working on web engine support for <a href="https://github.com/WebAssembly/exception-handling">exception handling</a>, reference types, and other proposals while continuing to contribute to tools and specification work. We plan to help ship more WebAssembly features in browsers during 2021, so look forward to our mid-year update post!</p>
Dates and Times in JavaScript2020-06-23T00:00:00Zhttps://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/<p><strong>tl;dr: We are looking for feedback on the <a href="https://tc39.es/proposal-temporal/docs/index.html">Temporal proposal</a>. Try out the <a href="https://www.npmjs.com/package/proposal-temporal">polyfill</a>, and complete the <a href="https://forms.gle/iL9iZg7Y9LvH41Nv8">survey</a>; but don't use it in production yet!</strong></p>
<p>JavaScript <code>Date</code> is broken in ways that cannot be fixed without breaking the web. As the story goes, it was included in the original 10-day JavaScript engine hack and based on java.util.Date, which itself was deprecated in 1997 due to being a terrible API and replaced with a better one. The result has been for all of JavaScript's history, the built-in <code>Date</code> has remained very hard to work with directly.</p>
<p>Starting a few years ago, a proposal has been developing, to add a new globally available object to JavaScript, <code>Temporal</code>. Temporal is a robust and modern API for working with dates, times, and timestamps, and also makes it easy to do things that were hard or impossible with <code>Date</code>, like converting dates between time zones, adding and subtracting while accounting for daylight saving time, working with date-only or time-only data, and even handling dates in non-Gregorian calendars. Although Temporal has "just works" defaults, it also provides fine-grained opt-in control of overflows, interpreting ambiguous times, and other corner cases. For more on the history of the proposal, and why it's not possible to fix <code>Date</code> itself, read <a href="https://maggiepint.com/2017/04/09/fixing-javascript-date-getting-started/">Maggie Pint's two-part blog post "Fixing JavaScript Date"</a>.</p>
<p>For examples of the power of Temporal, check out the <a href="https://tc39.es/proposal-temporal/docs/cookbook.html">cookbook</a>. Many of these examples would be difficult to do with legacy <code>Date</code>, particularly the ones involving time zones. (We would have put an example in this post, but the code might soon become stale, for reasons which will hopefully become clear!)</p>
<p>This <a href="https://github.com/tc39/proposal-temporal">proposal</a> is currently at Stage 2 in TC39's proposal process, and we<sup class="footnote-ref"><a href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/" id="fnref1">[1]</a></sup> are hoping to move it along to Stage 3 soon.<sup class="footnote-ref"><a href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/" id="fnref2">[2]</a></sup> We have been working on the feature set of <code>Temporal</code> and the API for a long time, and we believe it's full-featured and that the API is reasonable. You don't design good APIs solely on the drawing board, however, so it's time to put it to the test and let the JavaScript developer community try it out and see whether what we've come up with meets people's needs.</p>
<p>It is still early enough that we can make drastic changes to the API if we find we need to, based on the feedback that we get. So please, try it out and let us know!</p>
<h1 id="how-to-try-temporal" tabindex="-1">How to Try Temporal <a class="header-anchor" href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/">#</a></h1>
<p>If you just want to try Temporal out casually, with an interactive prompt, that's easy! Visit the <a href="https://tc39.es/proposal-temporal/docs/">API documentation</a> in your browser. On any of the documentation or cookbook pages, you can <a href="http://tinyurl.com/jscons">open your browser console</a> and Temporal will be already loaded, ready for you to try out the examples. Or you can try it out on <a href="https://npm.runkit.com/proposal-temporal">RunKit</a>.</p>
<p>Or, maybe you are interested in a bit more in-depth evaluation, like building a small test project using Temporal. We know this takes up people's valuable project time, but it's also the best way that we can get the most valuable feedback, so we'd really appreciate this! We have released a <a href="https://www.npmjs.com/package/proposal-temporal">polyfill</a> for the Temporal API on npm. You can use it in your project with <code>npm install --save proposal-temporal</code>, and import it in your project with <code>const { Temporal } = require('proposal-temporal');</code>.</p>
<p>However, don't use the polyfill in production applications! The proposal is still at Stage 2, and the polyfill has an 0.x version, so that should make it clear that the API is subject to change, and we do intend to keep changing it when we get feedback from you!</p>
<h1 id="how-to-give-feedback" tabindex="-1">How to Give Feedback <a class="header-anchor" href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/">#</a></h1>
<p>We would <em>love</em> to hear from you about your experiences with Temporal! Once you've tried it, we have a short <a href="https://forms.gle/iL9iZg7Y9LvH41Nv8">survey</a> for you to fill out. If you feel comfortable doing so, please leave us your contact information, since we might want to ask some follow up questions.</p>
<p>Please also open an issue on our <a href="https://github.com/tc39/proposal-temporal/issues">issue tracker</a> if you have some suggestion! We welcome suggestions whether or not you filled out the survey. You can also browse the feedback that's already been given in the issue tracker, and give it a thumbs-up if you agree or thumbs-down if you disagree.</p>
<p>Thanks for participating if you can! All the feedback that we receive now will help us make the right decisions as the proposal moves along to Stage 3 and Temporal eventually appears in your browser.</p>
<hr class="footnotes-sep">
<section class="footnotes">
<ol class="footnotes-list">
<li id="fn1" class="footnote-item"><p>"We" in this post means the <a href="https://github.com/tc39/proposal-temporal#champions">Temporal champions group</a>, a group of TC39 delegates and interested people. As you may guess from where this blog post is hosted, it includes members of Igalia's Compilers team, but this was written on behalf of the Temporal champions. <a href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/" class="footnote-backref">↩︎</a></p>
</li>
<li id="fn2" class="footnote-item"><p>Read the <a href="https://tc39.es/process-document/">TC39 process document</a> for more information on what these stages mean. tl;dr: Stage 2 is the time to give feedback on the proposal that can still be incorporated even if it requires drastic changes. Stage 3 is when the proposal remains stable except for serious problems discovered during implementation in browsers. <a href="https://blogs.igalia.com/compilers/2020/06/23/dates-and-times-in-javascript/" class="footnote-backref">↩︎</a></p>
</li>
</ol>
</section>
What we do at Igalia's Compiler Team2020-06-05T00:00:00Zhttps://blogs.igalia.com/compilers/2020/06/05/what-we-do-at-igalias-compiler-team/<h1 id="compilers-for-the-web" tabindex="-1">Compilers for the web <a class="header-anchor" href="https://blogs.igalia.com/compilers/2020/06/05/what-we-do-at-igalias-compiler-team/">#</a></h1>
<p>At Igalia, our development teams have included a team specializing in compilers since around 2012. Since most tech companies don't work on compilers or even more generally on programming language implementation, you might be wondering "What does a compilers team even do?". This blog post will try to explain, as well as highlight some of our recent work.</p>
<p>While many companies who work on compilers own or maintain their own programming language (e.g., like Google and Go, Apple and Swift, Mozilla and Rust, etc.), domain-specific compiler or language, Igalia is a little bit different.</p>
<p>Since we are a consulting company, our compiler team instead helps maintain and improve existing free software/open source programming language implementations, with a focus on languages for the web. In other words, we help improve JavaScript engines and, more recently, <a href="https://webassembly.org/">WebAssembly</a> (Wasm) runtimes.</p>
<p>To actually do the work, Igalia has grown a compilers team of developers from a variety of backgrounds. Some of our developers came into the job from a career in industry, and others from a research or academic setting. Our developers are contributors to a variety of non-web languages as well, including functional programming languages and scripting languages.</p>
<h1 id="our-recent-work" tabindex="-1">Our recent work <a class="header-anchor" href="https://blogs.igalia.com/compilers/2020/06/05/what-we-do-at-igalias-compiler-team/">#</a></h1>
<p>Given our team's diverse backgrounds, we are able to work on not only compiler <em>implementations</em> (which includes compilation, testing, maintenance, and so on) but also in the <em>standardization process</em> for language features. To be more specific, here are some examples of projects we're working on, split into several areas:</p>
<ul>
<li><strong>Maintenance:</strong> We work on the maintenance of JS engines to make sure they work well on platforms that our customers care about. For example, we maintain the support for 32-bit architectures in JavaScriptCore (WebKit's JS engine). This is especially important to us because WebKit is used on billions of embedded devices and we are the maintainers of <a href="https://webkit.org/wpe/">WPE</a>, the official WebKit port for embdedded systems.
<ul>
<li>This involves things like making sure that CI continues to pass on platforms like ARMv7 and MIPS, and also making sure that JS engine performance is good on these platforms.</li>
<li>Recently, some of our developers have been sharing their knowledge about JSC development in several blog posts. <a href="https://linki.tools/2019/10/a-brief-look-at-the-webkit-workflow.html">[1]</a>, <a href="https://tlog.quasinomial.net/posts/dive-into-jsc/">[2]</a>, <a href="https://caiolima.github.io/jsc/2020/03/12/jsc-inline-cache.html">[3]</a></li>
</ul>
</li>
<li><strong>JS feature development & standardization:</strong> We also work on implementing features proposed by the web platform community in all of the major JS engines, and we work on standardizing features as participants in <a href="https://tc39.es/">TC39</a>.
<ul>
<li>Recently we have been doing a lot of work around <a href="https://github.com/tc39/proposal-class-fields">class fields</a> and <a href="https://github.com/tc39/proposal-private-methods">private methods</a> in multiple browsers.</li>
<li>We're also involved in the work on the <a href="https://github.com/tc39/proposal-temporal">Temporal</a> proposal for better date/time management in JS.</li>
<li>Another example of our recent work in standardization is the <a href="https://github.com/tc39/proposal-bigint">BigInt</a> feature, which is now part of the <a href="https://tc39.es/ecma262/">language specification</a> for JS. Igalians led work on both the specification and also its implementation in browsers. <a href="https://vimeo.com/304865023">[1]</a>, <a href="https://wingolog.org/archives/2019/05/23/bigint-shipping-in-firefox">[2]</a> We are currently working on <a href="https://github.com/WebAssembly/JS-BigInt-integration">integrating</a> BigInts with WebAssembly as well.</li>
</ul>
</li>
<li><strong>WebAssembly:</strong> In the last year, we have gotten more involved in helping to improve Wasm, the new low-level compiler target language for the web (so that you can write C/C++/etc. code that will run on the web).
<ul>
<li>We have some recent blog posts on <a href="https://wingolog.org/archives/2020/03/25/firefoxs-low-latency-webassembly-compiler">understanding Firefox's baseline wasm compiler</a> and our work on implementing the <a href="https://wingolog.org/archives/2020/04/03/multi-value-webassembly-in-firefox-from-1-to-n">multi-value</a> proposal for the language.</li>
</ul>
</li>
</ul>
<p>In the future, we'll continue to periodically put pointers to our recent compilers work on this blog, so please follow along!</p>
<p>If you think you might be interested in helping to expand the web platform as a customer, don't hesitate to <a href="https://www.igalia.com/contact/">get in touch</a>!</p>
Awaiting the future of JavaScript in V82016-05-23T00:00:00Zhttps://blogs.igalia.com/compilers/2016/05/23/awaiting-the-future-of-javascript-in-v8/<p>On the evening of Monday, May 16th, 2016, we have made history. We've <a href="https://crrev.com/d08c0304c5779223d6c468373af4815ec3ccdb84">landed the initial implementation</a> of "Async Functions" in <a href="https://v8project.blogspot.com">V8</a>, the JavaScript runtime in use by the Google Chrome and Node.js. We do these things not because they are easy, but because they are hard. Because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one we are willing to accept. It is very exciting to see this, roughly 2 months of implementation, codereview and standards finangling/discussion to land. It is truly an honour.</p>
<p>To introduce you to Async Functions, it's first necessary to understand two things: the status quo of async programming in JavaScript, as well as Generators (previously implemented by fellow Igalian <a href="https://wingolog.org/archives/2013/05/08/generators-in-v8">Andy</a>)</p>
<p>Async programming in JavaScript has historically been implemented by callbacks. <code>window.setTimeout(function toExecuteLaterOnceTimeHasPassed() {}, ...)</code> being the common example. Callbacks on their own are not scalable: when numerous nested asynchronous operations are needed, code becomes extremely difficult to read and reason about. Abstraction libraries have been tacked on to improve this, including caolan's <a href="https://www.npmjs.com/package/async">async</a> package, or Promise libraries such as <a href="https://www.npmjs.com/package/q">Q</a>. These abstractions simplify control flow management and data flow management, and are a massive improvement over plain Callbacks. But we can do better! For a more detailed look at Promises, have a look at the fantastic <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise">MDN article</a>. Some great resources on why and how callbacks can lead to utter non-scalable disaster exist too, check out <a href="http://callbackhell.com">http://callbackhell.com</a>!</p>
<p>The second concept, Generators, allow a runtime to return from a function at an arbitrary line, and later re-enter that function at the following instruction, in order to continue execution. So right away you can imagine where this is going --- we can continue execution of the same function, rather than writing a closure to continue execution in a new function. Async Functions rely on this same mechanism (and in fact, on the underlying Generators implementation), to achieve their goal, immensely simplifying non-trivial coordination of asynchronous operations.</p>
<p>As a simple example, lets compare the following two approaches:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">function</span> <span class="token function">deployApplication</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">return</span> <span class="token function">cleanDirectory</span><span class="token punctuation">(</span>__DEPLOYMENT_DIR__<span class="token punctuation">)</span><span class="token punctuation">.</span><br> <span class="token function">then</span><span class="token punctuation">(</span>fetchNpmDependencies<span class="token punctuation">)</span><span class="token punctuation">.</span><br> <span class="token function">then</span><span class="token punctuation">(</span><br> <span class="token parameter">deps</span> <span class="token operator">=></span> Promise<span class="token punctuation">.</span><span class="token function">all</span><span class="token punctuation">(</span><br> deps<span class="token punctuation">.</span><span class="token function">map</span><span class="token punctuation">(</span><br> <span class="token parameter">dep</span> <span class="token operator">=></span> <span class="token function">moveToDeploymentSite</span><span class="token punctuation">(</span><br> dep<span class="token punctuation">.</span>files<span class="token punctuation">,</span><br> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>__DEPLOYMENT_DIR__<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">/deps/</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>dep<span class="token punctuation">.</span>name<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><br> <span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">.</span><br> <span class="token function">then</span><span class="token punctuation">(</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">=></span> <span class="token function">compileSources</span><span class="token punctuation">(</span>__SRC_DIR__<span class="token punctuation">,</span><br> __DEPLOYMENT_DIR__<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">.</span><br> <span class="token function">then</span><span class="token punctuation">(</span>uploadToServer<span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>The Promise boiler plate makes this preit harder to read and follow than it could be. And what happens if an error occurs? Do we want to add catch handlers to each link in the Promise chain? That will only make it even more difficult to follow, with error handling interleaved in difficult to read ways.</p>
<p>Lets refactor this using async functions:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">async</span> <span class="token keyword">function</span> <span class="token function">deployApplication</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">await</span> <span class="token function">cleanDIrectory</span><span class="token punctuation">(</span>__DEPLOYMENT_DIR__<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">let</span> dependencies <span class="token operator">=</span> <span class="token keyword">await</span> <span class="token function">fetchNpmDependencies</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <br> <span class="token comment">// *see below*</span><br> <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">let</span> dep <span class="token keyword">of</span> dependencies<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token keyword">await</span> <span class="token function">moveToDeploymentSite</span><span class="token punctuation">(</span><br> dep<span class="token punctuation">.</span>files<span class="token punctuation">,</span><br> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>__DEPLOYMENT_DIR__<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">/deps/</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>dep<span class="token punctuation">.</span>name<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span><br> <br> <span class="token keyword">await</span> <span class="token function">compileSources</span><span class="token punctuation">(</span>__SRC_DIR__<span class="token punctuation">,</span><br> __DEPLOYMENT_DIR__<span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token keyword">return</span> <span class="token function">uploadToServer</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br><span class="token punctuation">}</span></code></pre>
<p>You'll notice that the "moveToDeploymentSite" step is slightly different in the async function version, in that it completes each operation in a serial pipeline, rather than completing each operation in parallel, and continuing once finished. This is an unfortunate limitation of the async function specification, which will hopefully be improved on in the future.</p>
<p>In the meantime, it's still possible to use the Promise API in async functions, as you can <code>await</code> any Promise, and continue execution after it is resolved. This grants compatibility with numerous existing Web Platform APIs (such as <code>fetch()</code>), which is ultimately a good thing! Here's an alternative implementation of this step, which performs the <code>moveToDeploymentSite()</code> bits in parallel, rather than serially:</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">await</span> Promise<span class="token punctuation">.</span><span class="token function">all</span><span class="token punctuation">(</span>dependencies<span class="token punctuation">.</span><span class="token function">map</span><span class="token punctuation">(</span><br> <span class="token parameter">dep</span> <span class="token operator">=></span> <span class="token function">moveToDeploymentSite</span><span class="token punctuation">(</span><br> dep<span class="token punctuation">.</span>files<span class="token punctuation">,</span><br> <span class="token template-string"><span class="token template-punctuation string">`</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>__DEPLOYMENT_DIR__<span class="token interpolation-punctuation punctuation">}</span></span><span class="token string">/deps/</span><span class="token interpolation"><span class="token interpolation-punctuation punctuation">${</span>dep<span class="token punctuation">.</span>name<span class="token interpolation-punctuation punctuation">}</span></span><span class="token template-punctuation string">`</span></span><br><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>Now, it's clear from the <em>let dependencies = await fetchNpmDependencies();</em> line that Promises are unwrapped automatically. What happens if the promise is rejected with an error, rather than resolved with a value? With try-catch blocks, we can catch rejected promise errors inside async functions! And if they are not caught, they will automatically return a rejected Promise from the async function.</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">function</span> <span class="token function">throwsError</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">throw</span> <span class="token keyword">new</span> <span class="token class-name">Error</span><span class="token punctuation">(</span><span class="token string">"oops"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><br><br><span class="token keyword">async</span> <span class="token keyword">function</span> <span class="token function">foo</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token function">throwsError</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><br><br><span class="token comment">// will print the Error thrown in `throwsError`.</span><br><span class="token function">foo</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">catch</span><span class="token punctuation">(</span>console<span class="token punctuation">.</span>error<span class="token punctuation">)</span><br><br><span class="token keyword">async</span> <span class="token keyword">function</span> <span class="token function">bar</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br><span class="token keyword">try</span> <span class="token punctuation">{</span><br> <span class="token keyword">var</span> value <span class="token operator">=</span> <span class="token keyword">await</span> <span class="token function">foo</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><br> <span class="token punctuation">}</span> <span class="token keyword">catch</span> <span class="token punctuation">(</span>error<span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token comment">// Rejected Promise is unwrapped automatically, and</span><br> <span class="token comment">// execution continues here, allowing us to recover</span><br> <span class="token comment">// from the error! `error` is `new Error("oops!")`</span><br> <span class="token punctuation">}</span><br><span class="token punctuation">}</span></code></pre>
<p>There are also lots of convenient forms of async function declarations, which hopefully serve lots of interesting use-cases! You can concisely declare methods as asynchronous in Object literals and ES6 classes, by preceding the method name with the <code>async</code> keyword (without a preceding line terminator!)</p>
<pre class="language-js" tabindex="0"><code class="language-js"><span class="token keyword">class</span> <span class="token class-name">C</span> <span class="token punctuation">{</span><br> <span class="token keyword">async</span> <span class="token function">doAsyncOperation</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token comment">// ...</span><br> <span class="token punctuation">}</span><br><span class="token punctuation">}</span><span class="token punctuation">;</span><br><br><span class="token keyword">var</span> obj <span class="token operator">=</span> <span class="token punctuation">{</span><br> <span class="token keyword">async</span> <span class="token function">getFacebookProfileAsynchronously</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><br> <span class="token comment">/* ... */</span><br> <span class="token punctuation">}</span><br><span class="token punctuation">}</span><span class="token punctuation">;</span></code></pre>
<p>These features allow us to write more idiomatic, easier to understand asynchronous control flow in our applications, and future extensions to the ECMAScript specification will enable even more idiomatic forms for writing complex algorithms, in a maintainable and readable fashion. We are very excited about this! There are numerous other resources on the web detailing async functions, their benefits, and perhaps ways they might be improved in the future. Some good ones include <a href="https://jakearchibald.com/2014/es7-async-functions/">this piece from Google's Jake Archibald</a>, so give that a read for more details. It's a few years old, but it holds up nicely!</p>
<p>So, now that you've seen the overview of the feature, you might be wondering how you can try it out, and when it will be available for use. For the next few weeks, it's still too experimental even for the "Experimental Javascript" flag. But if you are adventurous, you can try it already! Fetch the latest Chrome Canary build, and start Chrome with the command-line-flag <code>--js-flags="--harmony-async-await"</code>. We can't make promises about the shipping timeline, but it could ship as early as Chrome 53 or Chrome 54, which will become stable in September or October.</p>
<p>We owe a shout out to Bloomberg, who have provided us with resources to improve the web platform that we love. Hopefully, we are providing their engineers with ways to write more maintainable, more performant, and more beautiful code. We hope to continue this working relationship in the future!</p>
<p>As well, shoutouts are owed to the Chromium team, who have assisted in reviewing the feature, verifying its stability, getting devtools integration working, and ultimately getting the code upstream. Terriffic! In addition, the WebKit team has also been very helpful, and hopefully we will see the feature land in JavaScriptCore in the not too distant future.</p>