Porting BOLT to RISC-V
Recently, initial support for RISC-V has landed in LLVM's BOLT subproject. Even though the current functionality is limited, it was an interesting experience of open source development to get to this point. In this post, I will talk about what BOLT is, what it takes to teach BOLT how to process RISC-V binaries, and the interesting detours I sometimes had to make to get this work upstream.
BOLT overview #
BOLT (Binary Optimization and Layout Tool) is a post-link optimizer whose primary goal is to improve the layout of binaries. It uses sample-based profiling to improve the performance of already fully-optimized binaries. That is, the goal is to be complementary to existing optimization techniques like PGO and LTO, not to replace them.
Sample-based profiling is used in order to make it viable to obtain profiles from production systems as its overhead is usually negligible compared to profiling techniques based on instrumentation. Another advantage is that no special build configuration is needed and production binaries can directly be profiled. The choice for binary optimization (as opposed to, say, optimizing at the IR level) comes from the accuracy of the profile data: since the profile is gathered at the binary level, mapping it back to a higher level representation of the code can be a challenging problem. Since code layout optimizations can quite easily be applied at the binary level, and the accuracy of the profile is highest there, the choice for performing post-link optimization seems to be a logical one.
To use BOLT, it needs access to a binary and corresponding profile. As mentioned
before, the goal is to optimize production binaries so no special build steps
are required. The only hard requirement is that the binary contains a symbol
table (so stripped binaries are not supported). In order for BOLT to be able to
rearrange functions (in addition to the code within functions), it needs access
to relocations. Linkers usually remove relocations from the final binary but can
be instructed to keep them using the --emit-relocs
flag. For best results, it
is recommended to link your binaries with this flag.
Gathering a profile on Linux systems can be done in the usual way using perf
.
BOLT provides the necessary tools to convert perf
output to an appropriate
format, and to combine multiple profiles. On systems where perf
is not
available, BOLT can also instrument binaries to create profiles. For more
information on how to use BOLT, see the documentation.
For more details on BOLT, including design decisions and evaluation, see the CGO'19 paper. Let's move on to discuss some of BOLT's internals to understand what is needed to support RISC-V.
BOLT internals #
Optimizing the layout of a binary involves shuffling code around. The biggest challenge in doing this, is making sure that all code references are still correct. Indeed, moving a function or basic block to a different location means changing its address and all jumps, calls, or other references to it need to be updated because of it.
To do this correctly, BOLT's rewriting pipeline transforms binaries in the following (slightly simplified) way:
- Function discovery: using (mostly) the ELF symbol table, the boundaries of functions are recorded;
- Disassembly: using LLVM's MC-layer, function bodies are
disassembled into lists of
MCInst
objects; - CFG construction: basic blocks are discovered in the instruction lists and references between them resolved, resulting in a control-flow graph for each function;
- Optimizations: using the CFG, basic block and function layout is optimized based on the profile;
- Assembly: the new layout is emitted, using LLVM's
MCStreamer
API, to an ELF object file in memory; - Link: since this object file might still contain external references, it is linked to produce the final binary.
Some of these steps are completely architecture independent. For example, function discovery only needs the ELF symbol table. Others do need architecture specific information. Fortunately, BOLT has supported multiple architectures from the beginning (X86-64 and AArch64) so an abstraction layer exists that makes it relatively straightforward to add a new target. Let's talk about what is needed to teach BOLT to transform RISC-V binaries.
Teaching BOLT RISC-V #
Thanks to BOLT's architecture abstraction layer, adding support for a new target turned out to be mostly straightforward. I will go over the parts of BOLT's rewriting pipeline that need architecture-specific information while focusing on the aspects of RISC-V that made this slightly tricky sometimes.
(Dis)assembly #
Assembly and disassembly of binaries is obviously architecture-dependent. BOLT
uses various MC-layer LLVM APIs to perform these tasks. More specifically,
MCDisassembler
is used for disassembly while
MCAssembler
is used (indirectly via
MCObjectStreamer
) for assembly. The good news is that
there is excellent RISC-V support in the MC-layer so this can readily be used by
BOLT.
CFG construction #
The result of disassembly is a linear list of instructions in the order they
appear in the binary. In the MC-layer, instructions are represented by
MCInst
objects. In this representation, instructions essentially
consist of an opcode and a list of operands, where operands could be registers,
immediates, or more high-level expressions (MCExpr
). Expressions can
be used, for example, to refer to symbolic program locations (i.e., labels)
instead of using constant immediates.
Right after disassembly, however, all operands will be registers or immediates. For example, an instruction like
jal ra, f
will be disassembled into (heavy pseudo-code here)
MCInst(RISCV::JAL, [RISCV::X1, ImmOffset])
where ImmOffset
is the offset from the jal
instruction to f
. This is not
convenient to handle in BOLT as nothing indicates that this MCInst
actually
refers to f
.
Therefore, BOLT post-processes instructions after disassembly and replaces immediates with symbolic references where appropriate. Two different mechanisms are used to figure out the address an instruction refers to:
- For control-transfer instructions (e.g., calls and branches),
MCInstrAnalysis
is used to evaluate the target. LLVM's RISC-V backend already contained an appropriate implementation for this. - For other instructions (e.g.,
auipc
/addi
pairs to load an address in RISC-V), relocations are used. For this, BOLT'sRelocation
class had to be extended to support RISC-V ELF relocations.
Once the target of an instruction had been determined, BOLT creates an
MCSymbol
at that location and updates the MCInst
to point to that symbol
instead of an immediate offset.
One question remains: how does BOLT detect control-transfer instructions? Let's first discuss how BOLT creates the control-flow graph now that all instructions symbolically refer to their targets.
A CFG is a directed graph where the nodes are basic blocks and the edges are control-flow transfers between those basic blocks. Without going into details, BOLT has a target-independent algorithm to create a CFG from a list of instructions (for those interested, you can find it here). It needs some target-specific information about instructions though. For example:
- Terminators are instructions that end basic block (e.g., branches and returns but not calls).
- Branches and jumps are the instructions that create edges in the CFG.
To get this information, BOLT relies again on MCInstrAnalysis
which provides
methods such as isTerminator
and isCall
. These methods can be specialized by
specific LLVM backends but the default implementation relies on the
MCInstrDesc
class. Objects of this class are generated by
various TableGen files in the backends (e.g., this one for
RISC-V). An important property of MCInstrDesc
for the next discussion is that
its information is based only on opcodes, operands are not taken into
account.
LLVM's RISC-V backend did not specialize MCInstrAnalysis
so BOLT was relying
MCInstrDesc
to get information about terminators and branches. For many
targets (e.g., X86) this might actually be fine but for RISC-V, this causes
problems. For example, take a jal
instruction: is this a terminator, a branch,
a call? Based solely on the opcode, we cannot actually answer these questions
because jal
is used both for direct jumps (terminator) and function calls
(non-terminator).
The solution to this problem was to specialize MCInstrAnalysis
for RISC-V
taking the calling convention into account:
jal zero, ...
is an unconditional branch (return address discarded);jal ra, ...
is a call (return address stored inra
(x1
) which the calling convention designates as the return address register);- Some more rules for
jalr
, compressed instructions, detecting returns,...
So the first patch that landed to pave the way for RISC-V support in BOLT was not in the BOLT project but in the RISC-V MC-layer.
With this in place, the patch to add a RISC-V target to BOLT
consisted mainly of implementing the necessary relocations and implementing the
architecture abstraction layer. The latter consisted mainly of instruction
manipulation (e.g., updating branch targets), detecting some types of
instructions not supported by MCInstrAnalysis
(e.g., nops), and analyzing
RISC-V-specific Procedure Linkage Table (PLT) entries (so BOLT knows which
function they refer to). Once I started to understand the internals of BOLT,
this was relatively straightforward. After iterating over the patch with the
BOLT maintainers (who were very helpful and responsive during this process), it
got accepted in less than a month.
There was just one minor issue to resolve.
Linking #
The final step in the rewriting pipeline is linking the generated object file. BOLT is able to rely on LLVM again by using the RuntimeDyld JIT linker which is part of the MCJIT project. Unfortunately, there was no RISC-V support yet in RuntimeDyld. Looking at the supported targets, it seemed easy enough to implement RISC-V support: I just needed to implement the few relocations that BOLT emits. So I submitted a patch.
Alas, it seemed that things might not be as easy as I hoped:
Is there something preventing Bolt from moving to ORC / JITLink? If Bolt is able to move over then the aim should be to do that. If Bolt is unable to move over then we need to know why so that we can address the issue. RuntimeDyld is very much in maintenance mode at the moment, and we're working hard to reach parity in backend coverage so that we can officially deprecate it.
Even though this comment was followed up by this:
None of that is a total blocker to landing this, but the bar is high, and it should be understood that Bolt will need to migrate in the future.
trying to push-through the patch didn't feel like the right approach. For one, I'm anticipating to need some more advanced linker features for RISC-V in the future (e.g., linker relaxation) and I wouldn't want to implement those in a deprecated linker. Moreover, the recommended linker, JITLink, has mostly complete RISC-V support and, importantly, more users and reviewers, making its implementation most certainly of higher quality than what I would implement by myself in RuntimeDyld.
So the way forward for bringing RISC-V support to BOLT seemed to be to first port BOLT from using RuntimeDyld to JITLink. Since it looked like this wasn't going to be a priority for the BOLT maintainers, I decided I might as well give it a shot myself. Even though this would surely mean a significant delay in finishing my ultimate goal of RISC-V support in BOLT, it felt like a great opportunity to me: it allowed me to learn more about linkers and BOLT's internals, as well as to invest in a project that am hoping to use in the foreseeable future.
Porting BOLT to JITLink was hard, at least for me. It had a far ranging impact on many parts of BOLT that I had never touched before. This meant it took quite some time to try and understand these parts, but also that I learned a lot in the process. Besides changes to BOLT, I submitted a few JITLink patches to implement some missing AArch64 relocations that BOLT needed. In the end, I managed to pass all BOLT tests and submit a patch.
This patch took about a month and a half to get accepted. The BOLT maintainers were very helpful and responsive in the process. They were also very strict, though. Rightfully so, of course, as BOLT is being used in production systems. The main requirement for the patch to get accepted was that BOLT's output would be a 100% binary match with the RuntimeDyld version. This was necessary to ease the verification of the correctness of the patch. With the help of the BOLT maintainers, we managed to get the patch in an acceptable state to land it.
Looking forward #
With BOLT being ported to JITLink, the patch to add initial RISC-V support to BOLT could finally land. This doesn't mean that BOLT is currently very usable for RISC-V binaries, though: most binaries can pass through BOLT fine but many of BOLT's transformations are not supported yet.
Since the initial support was added, I landed a few more patches to improve
usability. For example, support for an obscure ELF feature called composed
relocations was added, something
RISC-V uses for R_RISCV_ADD32/SUB32
relocations (which BOLT
supports now). Other patches deal with
creation and reversal
of branches, something BOLT needs to fix-up basic blocks after their layout has
changed.
I'm currently working on handling binaries that have been relaxed during
linking. The issue is that, after BOLT has moved code around, relaxed
instructions might not fit the new addresses anymore. I plan to handle this as
follows: during disassembly, BOLT will "unrelax" instructions (e.g., translating
a jal
back to an auipc
/jalr
pair) to make sure new addresses will always
fit. The linker will then undo this, when possible, by performing relaxation
again. The first step for this, adding linker relaxation support to JITLink,
has been landed. More on this in a future post.
Wrapping up #
Bringing initial RISC-V support to BOLT has been a very interesting and educational journey for me, both from a technical as well as a social perspective. Having to work on multiple projects (LLVM MC, JITLink, BOLT) has taught me new technologies and put me in contact with great communities. I certainly hope to be able to continue this work in the future.
I'll close this post with a reference of the graph at the top, showing what it took, over a series of ~25 patches, to get RISC-V support in BOLT. I think this demonstrates the kind of detours that are sometimes needed to get work upstream, in this case benefiting both the RISC-V community (RISC-V support in BOLT) and BOLT as a whole (moving away from a deprecated linker and fixing bugs encountered along the way)