José Dapena blog

Just another Igalia Blogs site

Native call stack profiling (1/3): introduction

This week I presented a lightning talk in BlinkOn 17. There I talked about the work for improving native stack profiling support in Windows.

This post starts a series where I will provide more context and details to the presentation.

Why callstack profiling #

First, a definition:

Callstack profiling: a performance analysis tool, that samples periodically the call stacks of all threads, for a specific workload.

Why is it useful? It provides a better undestanding of performance problems, specially if they are caused by CPU-bound bottle necks.

As we sample the full stack for each thread, we are capturing a handful of information:

But it is not only useful for CPU waits. It will also capture when a method is waiting for something (i.e. because of networking, or a semaphore).

The provided information is useful for initial analysis of the problem, as it will give a high level view of where time could be spent by the application. But it will also be useful in further stages of the analysis, and even for comparing different implementations and consider possible changes.

How does it work? #

For call stack sampling, we need some infrastructure to be able to capture and traverse properly the callstack for each thread.

In compilation stage, information is added for function names and the frame pointers. This allows, for a specific stack, to resolve later the actual names, and even lines of code that are captured.

In runtime stage, function information will be required for generated code. I.e. in a web browser, the Javascript code that is compiled in runtime.

Then, every sample will extract the callstack of all the threads of all the analysed processes. This will happen periodically, at the rate established by the profiling tool.

System wide native callstack profiling #

When possible, sampling the call stacks of the full system can be benefitial for the analysis.

First, we may want to include system libraries and other dependencies of our component in the analysis. But also, system analyzers can provide other metrics that can give a better context to the analysed workload (network or CPU load, memory usage, swappiness, …).

In the end, many problems are not bound to a single component, so capturing the interaction with other components can be useful.

Next #

In next blog posts in this series, I will present native stack profiling for Windows, and how it is integrated with Chromium.