Last year at Igalia we started coding pflua, a high-performance packet filter which runs on top of LuaJIT. Pflua is now included in Snabb Switch as an external library and it’s used to do all the packet filtering tasks that were initially done using libpcap. Pflua is capable of compiling, while performing several optimizations, pflang expressions to Lua functions. One of the first tasks I did in pflua was obtaining the machine code that LuaJIT produces for a translated Lua function. In this post, I take a high-level look at LuaJIT’s disassembler, the piece of code that allows LuaJIT to print out the compiled machine code that itself produces.

LuaJIT 2.1, currently in beta stage, introduced two very useful tools: a statistical profiler and a code dumper. Both tools are actually Lua modules. They can be used either from source code or externally when calling LuaJIT. In that case, it’s necessary to add the parameters -jp=options and jdump=options.

LuaJIT’s profiler can help us to understand what code is hot, in other words, what parts of the program are consuming most of the CPU cycles. The dumper helps us understand what code is produced for every trace. It’s possible to peek at the resulting Lua’s bytecode, LuaJIT’s SSA IR (Static-single assignment IR, an intermediate representation form often used in compilers for procedural languages such as Java and C) and machine code, using different flags. Either for the profiler and the code dumper, all the possible flags are best documented at the headers of their respective source code files: src/jit/p.lua and src/jit/dump.lua. In the particular case of the code dumper, the flag ‘m’ prints out machine code for a trace:

$ luajit -jdump=m -e "local x = 1; for i=1,1e6 do x = x + 1 end; print(x)"
---- TRACE 1 start (command line):1
---- TRACE 1 mcode 81
0bcaffa3  mov dword [0x40c22410], 0x1
0bcaffae  movsd xmm0, [0x40bd7120]
0bcaffb7  cvttsd2si ebp, [rdx+0x8]
0bcaffbc  cmp dword [rdx+0x4], 0xfffeffff
0bcaffc3  jnb 0x0bca0010        ->0
0bcaffc9  movsd xmm7, [rdx]
0bcaffcd  addsd xmm7, xmm0
0bcaffd1  add ebp, +0x01
0bcaffd4  cmp ebp, 0x000f4240
0bcaffda  jg 0x0bca0014 ->1
->LOOP:
0bcaffe0  addsd xmm7, xmm0
0bcaffe4  add ebp, +0x01
0bcaffe7  cmp ebp, 0x000f4240
0bcaffed  jle 0x0bcaffe0        ->LOOP
0bcaffef  jmp 0x0bca001c        ->3

LuaJIT uses its own disassembler to print out machine code. There’s one for every architecture supported: ARM, MIPS, PPC, x86 and x64. LuaJIT’s disassemblers are actually Lua modules, written in Lua (some of them as small as 500 LOC), and live at src/jit/dis_XXX.lua. Mike Pall comments on the header of dis_x86.lua that an external disassembler could have been used to do the actual disassembling and later integrate its result with the dumper module, but that design would be more fragile. So he decided to implement his own disassemblers.

As they are coded as modules, it could be possible to reuse them in other programs. Basically, each disassembler module exports three functions: create, disass and regname. The disass function creates a new context and disassembles a chunk of code starting at address. So it would be possible to pass a section of a binary and get it decoded using the disass function.

In the example below, I’m going to use LuaJIT’s x86-64 disassembler to print out the .text section of a binary, which is the section that contains the compiled source code. I use the binary of following hello-world program as input.

#include <stdio.h>

int main(int argc, char *argv[])
{
    if (argc < 2)
        printf("Usage: hello <name>\n");
    printf("Hello %s!", argv[1]);
    return 0;
}

I need to know the offset and size of the .text section:

$ readelf --wide -S ./hello-world
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [13] .text             PROGBITS        0000000000400520 000520 0001b2 00  AX  0   0 16

Now all I have to do is to read that chunk of the binary and pass it to disass.

local disass = require("dis_x64")

local function readfile(name, offset, size)
   local f = io.open(name, "rb")
   if not f then
      error(("Couldn't open file: %s"):format(name))
   end
   f:seek("set", offset)
   local content = f:read(size)
   f:close(file)
   return content
end

local function main()
   local filename, offset, size = unpack(arg)
   assert(filename, ("Couldn't find file: %s"):format(filename))
   offset = assert(tonumber(offset), "No valid offset")
   size = assert(tonumber(size), "No valid size")
   local mcode = readfile(filename, offset, size)
   disass.disass(mcode)
end

And this what I get when I print out the first 10 lines of the .text section:

$ luajit ./dis.lua ./hello-world 0x520 0x1b2 | head -10
00000000  31ED              xor ebp, ebp
00000002  4989D1            mov r9, rdx
00000005  5E                pop rsi
00000006  4889E2            mov rdx, rsp
00000009  4883E4F0          and rsp, -0x10
0000000d  50                push rax
0000000e  54                push rsp
0000000f  49C7C0D0064000    mov r8, 0x004006d0
00000016  48C7C160064000    mov rcx, 0x00400660
0000001d  48C7C716064000    mov rdi, 0x00400616

To validate the output is correct I compared it to the same output produced by ndisasm and objdump.

$ ndisasm -b 64 ./hello-world | grep -A 10 "400520"
00000520  31ED              xor ebp,ebp
00000522  4989D1            mov r9,rdx
00000525  5E                pop rsi
00000526  4889E2            mov rdx,rsp
00000529  4883E4F0          and rsp,byte -0x10
0000052D  50                push rax
0000052E  54                push rsp
0000052F  49C7C0D0064000    mov r8,0x4006d0
00000536  48C7C160064000    mov rcx,0x400660
0000053D  48C7C716064000    mov rdi,0x400616

It looks almost the same as LuaJIT’s disassembler, but that’s because LuaJIT’s dissasembler follows ndisasm format, as it’s stated in the source code. Objdump produces a slightly different output but semantically equivalent:

$ objdump -M intel -j .text -d hello-world

Disassembly of section .text:

0000000000400520 <_start>:
  400520:       31 ed                   xor    ebp,ebp
  400522:       49 89 d1                mov    r9,rdx
  400525:       5e                      pop    rsi
  400526:       48 89 e2                mov    rdx,rsp
  400529:       48 83 e4 f0             and    rsp,0xfffffffffffffff0
  40052d:       50                      push   rax
  40052e:       54                      push   rsp
  40052f:       49 c7 c0 d0 06 40 00    mov    r8,0x4006d0
  400536:       48 c7 c1 60 06 40 00    mov    rcx,0x400660
  40053d:       48 c7 c7 16 06 40 00    mov    rdi,0x400616
```

It is possible to the same thing by instantiating a context via create and call context:disass() to disassemble a chunk of machine code. This approach allow us to have a finer control of the output as create is passed a callback for each diassembled line. We could accumulate the disassembled lines in a variable or print them out to stdio, as in this example.