This is the last blog post of the series. In first post I presented some concepts of call stack profiling, and why it is useful. In second post I reviewed Event Tracing for Windows, the native tool for the purpose, and how it can be used to trace Chromium.
This last post will review the work done in 2022 to improve the support in V8 of call stack profiling in Windows.
This work was presented as a lightning talk in BlinkOn 17.
Some bad news to start… and a fix
After some bisecting I found this was a regression introduced by this commit, that changed the
--js-flags handling to a later stage. This happened to be after V8 initialization, so the code that would enable instrumentation would not consider the flag.
The fix I implemented moved flags processing to happen right before platform initialization, so instrumentation worked again.
Simplified method names
Another fix I worked was to improve the methods name generation. Windows tracing would show a quite redundant description of each level, and that was making analysis more difficult.
Before my work, the entries would look like this:
string-tagcloud.js!LazyCompile:~makeTagCloud- string-tagcloud.js:231-232:22 0x0
After my change, now it looks like this:
The fix adds a specific implementation for ETW. Instead of reusing the method name that is also used for Perf, it has a specific implementation for function that takes into account what ETW backend exports already, to avoid redundancy. It also takes advantage of the existing method
DebugNameCStr to retrieve inferred method names in case there is no name available.
The way V8 ETW worked was that, when tracing was ongoing and a new function was compiled in JIT, it would emit information to ETW.
This implied a big problem. If a function was compiled by V8 before tracing started, then ETW would not properly resolve the function names so, when analyzing the traces, it would not be possible to know which function was called at any of the samples.
And a performance fix
I also fixed a huge performance penalty when tracing code from snapshots, caused by calculating all the time the end line numbers of code instead of caching it.
Paolo Severini improved the initialization code, so the initialization of an ETW session was lighter, and also tracing would be started or stopped correctly.
Benchmarking ETW overhead
After all these changes I did some benchmarking with and without ETW. The goal was knowing if it would be good to enable by default ETW support in V8, not requiring to pass any JS flag.
With Sunspider in a Windows 64 bits build:
Other benchmarks I tried gave similar numbers.
So far, in 64 bits architecture I could not detect any overhead of enabling ETW support when recording is not happening, and the cost when it is enabled is very low.
Though, when combined with interpreted frames native stack, the overhead is close to 10%. This was expected as explained here.
So, good news so far. We still need to benchmark 32 bit architecture to see if the impact is similar.
There is still a lot of things to do, and I hope I can continue working on improvements for V8 ETW support next year:
- First, finishing the benchmarks, and considering to enable ETW instrumentation by default in V8 and derivatives.
- Add full support for WASM.
- Bugfixing, as we still see segments missing in certain benchnarmks.
- Create specific events for when the JIT information of already compiled symbols is sent to ETW, to make it easier to differenciate from the code compiled while recording a trace.
If you want to track the work, keep an eye on V8 issue 11043.
This is the last post in the series.
Thanks to Bloomberg and Igalia for sponsoring my work in ETW Chromium integration improvements!