Native call stack profiling (3/3): 2022 work in V8

This is the last blog post of the series. In first post I presented some concepts of call stack profiling, and why it is useful. In second post I reviewed Event Tracing for Windows, the native tool for the purpose, and how it can be used to trace Chromium.

This last post will review the work done in 2022 to improve the support in V8 of call stack profiling in Windows.

I worked on several of the fixes this year. This work has been sponsored by Bloomberg and Igalia.

This work was presented as a lightning talk in BlinkOn 17.

Some bad news to start… and a fix

In March I started working on the report that Windows event traces where not properly resolving the Javascript symbols.

After some bisecting I found this was a regression introduced by this commit, that changed the --js-flags handling to a later stage. This happened to be after V8 initialization, so the code that would enable instrumentation would not consider the flag.

The fix I implemented moved flags processing to happen right before platform initialization, so instrumentation worked again.

Simplified method names

Another fix I worked was to improve the methods name generation. Windows tracing would show a quite redundant description of each level, and that was making analysis more difficult.

Before my work, the entries would look like this:

string-tagcloud.js!LazyCompile:~makeTagCloud- string-tagcloud.js:231-232:22 0x0

After my change, now it looks like this:

string-tagcloud.js!makeTagCloud-231:22 0x0

The fix adds a specific implementation for ETW. Instead of reusing the method name that is also used for Perf, it has a specific implementation for function that takes into account what ETW backend exports already, to avoid redundancy. It also takes advantage of the existing method DebugNameCStr to retrieve inferred method names in case there is no name available.

Problem with Javascript code compiled before tracing

The way V8 ETW worked was that, when tracing was ongoing and a new function was compiled in JIT, it would emit information to ETW.

This implied a big problem. If a function was compiled by V8 before tracing started, then ETW would not properly resolve the function names so, when analyzing the traces, it would not be possible to know which function was called at any of the samples.

The solution is conceptually simple. When tracing starts, V8 traverse the living Javascript contexts and emit all the symbols. This adds noise to the tracing, as it is an expensive process. But, as it happens at the start of the tracing, it is very easy to isolate in the captured trace.

And a performance fix

I also fixed a huge performance penalty when tracing code from snapshots, caused by calculating all the time the end line numbers of code instead of caching it.

Initialization improvements

Paolo Severini improved the initialization code, so the initialization of an ETW session was lighter, and also tracing would be started or stopped correctly.

Benchmarking ETW overhead

After all these changes I did some benchmarking with and without ETW. The goal was knowing if it would be good to enable by default ETW support in V8, not requiring to pass any JS flag.

With Sunspider in a Windows 64 bits build:

Image showing slight overhead with ETW and bigger one with interpreted frames.

Other benchmarks I tried gave similar numbers.

So far, in 64 bits architecture I could not detect any overhead of enabling ETW support when recording is not happening, and the cost when it is enabled is very low.

Though, when combined with interpreted frames native stack, the overhead is close to 10%. This was expected as explained here.

So, good news so far. We still need to benchmark 32 bit architecture to see if the impact is similar.

Try it!

The work described in this post is available in V8 10.9.0. I hope you enjoy the improvements, and specially hope these tools help in the investigation of performance issues around Javascript, in NodeJS, Google Chrome or Microsoft Edge.

What next?

There is still a lot of things to do, and I hope I can continue working on improvements for V8 ETW support next year:

  • First, finishing the benchmarks, and considering to enable ETW instrumentation by default in V8 and derivatives.
  • Add full support for WASM.
  • Bugfixing, as we still see segments missing in certain benchnarmks.
  • Create specific events for when the JIT information of already compiled symbols is sent to ETW, to make it easier to differenciate from the code compiled while recording a trace.

If you want to track the work, keep an eye on V8 issue 11043.

The end

This is the last post in the series.

Thanks to Bloomberg and Igalia for sponsoring my work in ETW Chromium integration improvements!