Rego’s Everyday Life

A blog about my work at Igalia.

Performance of selection with CSS Regions in WebKit and Blink (Part II – perf profiler)

After the initial post introducing this topic and describing the Performance Tests (perftests), now is time to explain how to analyze the performance issues with a profiler in order to improve the code. “Manual” measurements

First of all, you can think of doing some measurements in the source code trying to find the possible bottle necks in an application. For example you could use the following lines inside WebKit/Blink in order to measure the time required to execute a given function:

timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
printf("%lld.%.9ld\n", (long long)ts.tv_sec, ts.tv_nsec);
// Call to the function you want to measure.
timespec ts2;
clock_gettime(CLOCK_REALTIME, &ts2);
printf("%lld.%.9ld\n", (long long)ts2.tv_sec, ts2.tv_nsec);
printf("Diff: %lld.%.9ld\n", (long long)ts2.tv_sec - ts.tv_sec, ts2.tv_nsec - ts.tv_nsec);

Actually this is a pretty bad idea for a number of reasons:

Using perf profiler

In this case we are going to talk about how to use perf in a GNU/Linux environment to analyze the performance of the WebKitGTK+ port. You could follow the next steps:

About how to collect the data, in WebKitGTK+ you have the alternative to generate the perf data files while running the perftests. Just adding some arguments to run-perf-tests script: Tools/Scripts/run-perf-tests --platform=gtk --debug --profile

Which will create a file called test.data under WebKitBuild/Debug/layout-test-results/ folder.

Analyze profile session

perf report provides you the list of methods that have been running for more time, with the percentage of how many time was spent in each of them. Then you can get the full backtraces from the different places the method is called and know how many times it was invoked from each trace.

Let’s use a concrete example to illustrate it. This is a perf report output got from the WebProcess doing a big selection in a page with CSS Regions:

-   6.26%  lt-WebKitWebPro  libwebkit2gtk-3.0.so.25.5.0         [.] WebCore::LayoutUnit::rawValue() const
   - WebCore::LayoutUnit::rawValue() const
      - 33.10% WebCore::operator+(WebCore::LayoutUnit const&, WebCore::LayoutUnit const&)
         - 39.54% WebCore::LayoutRect::maxX() const
            - 83.75% WebCore::LayoutRect::intersect(WebCore::LayoutRect const&)
               - 98.69% WebCore::RenderRegion::repaintFlowThreadContentRectangle(WebCore::LayoutRect const&, bool, WebCore::LayoutRect const&, WebCore::LayoutRect const&, WebCore::LayoutPoint const&) const
                    WebCore::RenderRegion::repaintFlowThreadContent(WebCore::LayoutRect const&, bool) const
                    WebCore::RenderFlowThread::repaintRectangleInRegions(WebCore::LayoutRect const&, bool) const
                  - WebCore::RenderObject::repaintUsingContainer(WebCore::RenderLayerModelObject const*, WebCore::IntRect const&, bool) const
                     - 75.49% WebCore::RenderSelectionInfo::repaint()
                          WebCore::RenderView::setSelection(WebCore::RenderObject*, int, WebCore::RenderObject*, int, WebCore::RenderView::SelectionRepaintMode)

As you can see, 6.26% of the time is spent in LayoutUnit::rawValue() method, this is used in lots of places. 33.10% of the time it’s called from WebCore::operator+() which is also quite generic. We should keep going deeper in the call-graph till we reach some methods that are interesting in our particular case.

In this case, the selection starts in RenderView::setSelection(), so we should investigate further the methods called from there. Of course, in order to do that you need to have some understanding of the code where you’re moving or you’ll end up completely lost.

Improve source code

Thanks to this data I realized that in each RenderSelectionInfo::repaint() it’s used the RenderObject::containerForRepaint(). Which most times returns the parent RenderNamedFlowThread for all its children in the render tree.

This causes that for every element under RenderNamedFlowThread the method RenderFlowThread::repaintRectangleInRegions() is called. Taking a look to this method it has a loop over all the regions forcing a repaint. This means that if you have 1000 regions, even if you’re just selecting in one of them, a repaint in the rest of regions is executed.

So, I’ve provided a patch that repaints only the affected regions, it means around 12%, 18% and 73% improvement in Layout/RegionsSelection.html, Layout/RegionsSelectAllMixedContent.html and Layout/RegionsExtendingSelectionMixedContent.html perftest respectively.

Doing tests with more regions using this example we got the following results:

Regions Without patch (ms) With patch (ms) Improvement
100 923 338 63%
150 2712 727 73%
200 5952 1285 78%
500 81731 7868 90%

As expected, results are better as we increase the number of regions.

Conclusions

This was just one specific example in order to explain how to use the available tools, trying to provide the required context to understand them properly.

For sure, there’s still plenty of work to be done in order to improve the performance of selection with CSS Regions. Nonetheless, we still have to settle a final implementation for the selection in CSS Regions before going on with the optimization efforts, as it was explained in my previous post we’re on the way to fix it.

This work has been done inside the collaboration between Adobe and Igalia around CSS Regions.

In Igalia we have a great experience in performance optimization for CSS standards like CSS Flexible Box Layout, CSS Grid Layout and CSS Regions. Please don’t hesitate to contact us if you need some help around these topics.


Comments

On 14/02/02 18:58, **J.A.** wrote:

Very interesting post, specially because it explains with a real example how to use a tool to improve the performance of an application.

Eager to see other examples using other tools :)