Untangling the Web

Paweł Lampe's Blog

WPE performance considerations: DOM tree

Designing performant web applications is not trivial in general. Nowadays, as many companies decide to use web platform on embedded devices, the problem of designing performant web applications becomes even more complicated. Typical embedded devices are orders of magnitude slower than desktop-class ones. Moreover, the proportion between CPU and GPU power is commonly different as well. This usually results in unexpected performance bottlenecks when the web applications designed with desktop-class devices in mind are being executed on embedded environments.

In order to help web developers approach the difficulties that the usage of web platform on embedded devices may bring, this blog post initiates a series of articles covering various performance-related aspects in the context of WPE WebKit usage on embedded devices. The coverage in general will include:

This article, in particular, discusses the overhead of nodes in the DOM tree when it comes to layouting. It does that primarily by investigating the impact of idle nodes that introduce the least overhead and hence may serve as a lower bound for any general considerations. With the data presented in this article, it should be clear how the DOM tree size/depth scales in the case of embedded devices.

DOM tree #

Historically, the DOM trees emerging from the usual web page designs were rather limited in size and fairly shallow. This was the case as there were no reasons for them to be excessively large unless the web page itself had a very complex UI. Nowadays, not only are the DOM trees much bigger and deeper, but they also tend to contain idle nodes that artificially increase the size/depth of the tree. The idle nodes are the nodes in the DOM that are active yet do not contribute to any visual effects. Such nodes are usually a side effect of using various frameworks and approaches that conceptualize components or services as nodes, which then participate in various kinds of processing utilizing JavaScript. Other than idle nodes, the DOM trees are usually bigger and deeper nowadays, as there are simply more possibilities that emerged with the introduction of modern APIs such as Shadow DOM, Anchor positioning, Popover, and the like.

In the context of web platform usage on embedded devices, the natural consequence of the above is that web designers require more knowledge on how the particular browser performance scales with the DOM tree size and shape. Before considering embedded devices, however, it’s worth to take a brief look at how various web engines scale on desktop with the DOM tree growing in depth.

Desktop considerations #

To measure the impact of the DOM tree depth on the performance, the random-number-changing-in-the-tree.html?vr=0&ms=1&dv=0&ns=0 demo can be used to perform a series of experiments with different parameters.

In short, the above demo measures the average duration of a benchmark function run, where the run does the following:

Moreover, the demo allows one to set 0 or more parent idle nodes for the node holding text, so that the layout must consider those idle nodes as well.

The parameters used in the URL above mean the following:

The idea behind the experiment is to check how much overhead is added as the number of extra idle nodes (ns=N) in the DOM tree increases. Since the browsers used in the experiments are not fair to compare due to various reasons, instead of concrete numbers in milliseconds, the results are presented in relative terms for each browser separately. It means that the benchmarking result for ns=0 serves as a baseline, and other results show the relative duration increase to that baseline result, where, e.g. a 300% increase means 3 times the baseline duration.

The results for a few mainstream browsers/browser engines (WebKit GTK MiniBrowser [09.09.2025], Chromium 140.0.7339.127, and Firefox 142.0) and a few experimental ones (Servo [04.07.2024] and Ladybird [30.06.2024]) are presented in the image below:

Idle nodes overhead on mainstream browsers.

As the results show, trends among all the browsers are very close to linear. It means that the overhead is very easy to assess, as usually N times more idle nodes will result in N times the overhead. Moreover, up until 100-200 extra idle nodes in the tree, the overhead trends are very similar in all the browsers except for experimental Ladybird. That in turn means that even for big web applications, it’s safe to assume the overhead among the browsers will be very much the same. Finally, past the 200 extra idle nodes threshold, the overhead across browsers diverges. It’s very likely due to the fact that the browsers are not optimizing such cases as a result of a lack of real-world use cases.

All in all, the conclusion is that on desktop, only very large / specific web applications should be cautious about the overhead of nodes, as modern web browsers/engines are very well optimized for handling substantial amounts of nodes in the DOM.

Embedded device considerations #

When it comes to the embedded devices, the above conclusions are no longer applicable. To demonstrate that, a minimal browser utilizing WPE WebKit is used to run the demo from the previous section both on desktop and NXP i.MX8M Plus platforms. The latter is a popular choice for embedded applications as it has quite an interesting set of features while still having strong specifications, which may be compared to those of Raspberry Pi 5. The results are presented in the image below:

Idle nodes overhead compared between desktop and embedded devices.

This time, the Y axis presents the duration (in milliseconds) of a single benchmark run, and hence makes it very easy to reason about overhead. As the results show, in the case of the desktop, 100 extra idle nodes in the DOM introduce barely noticeable overhead. On the other hand, on an embedded platform, even without any extra idle nodes, the time to change and layout the text is already taking around 0.6 ms. With 10 extra idle nodes, this duration increases to 0.75 ms — thus yielding 0.15 ms overhead. With 100 extra idle nodes, such overhead grows to 1.3 ms.

One may argue if 1.3 ms is much, but considering an application that e.g. does 60 FPS rendering, the time at application disposal each frame is below 16.67 ms, and 1.3 ms is ~8% of that, thus being very considerable. Similarly, for the application to be perceived as responsive, the input-to-output latency should usually be under 20 ms. Again, 1.3 ms is a significant overhead for such a scenario.

Given the above, it’s safe to state that the 20 extra idle nodes should be considered the safe maximum for embedded devices in general. In case of low-end embedded devices i.e. ones comparable to Raspberry Pi 1 and 2, the maximum should be even lower, but a proper benchmarking is required to come up with concrete numbers.

Inline vs block #

While the previous subsection demonstrated that on embedded devices, adding extra idle nodes as parents must usually be done in a responsible way, it’s worth examining if there are nuances that need to be considered as well.

The first matter that one may wonder about is whether there’s any difference between the overhead of idle nodes being inlines (display: inline) or blocks (display: block). The intuition here may be that, as idle nodes have no visual impact on anything, the overhead should be similar.

To verify the above, the demo from Desktop considerations section can be used with dv parameter used to control whether extra idle nodes should be blocks (1, <div>) or inlines (0, <span>). The results from such experiments — again, executed on NXP i.MX8M Plus — are presented in the image below:

Comparison of overhead of idle nodes being inline or block elements.

While in the safe range of 0-20 extra idle nodes the results are very much similar, it’s evident that in general, the idle nodes of block type are actually introducing more overhead.

The reason for the above is that, for layout purposes, the handling of inline and block elements is very different. The inline elements sharing the same line can be thought of as being flattened within so called line box tree. The block elements, on the other hand, have to be represented in a tree.

To show the above visually, it’s interesting to compare sysprof flamegraphs of WPE WebProcess from the scenarios comprising 20 idle nodes and using either <span> or <div> for idle nodes:

idle <span> nodes:
Sysprof flamegraph of WPE WebProcess layouting inline elements.
idle <div> nodes:
Sysprof flamegraph of WPE WebProcess layouting block elements.

The first flamegraph proves that there’s no clear dependency between the call stack and the number of idle nodes. The second one, on the other hand, shows exactly the opposite — each of the extra idle nodes is visible as adding extra calls. Moreover, each of the extra idle block nodes adds some overhead thus making the flamegraph have a pyramidal shape.

Whitespaces #

Another nuance worth exploring is the overhead of text nodes created because of whitespaces.

When the DOM tree is created from the HTML, usually a lot of text nodes are created just because of whitespaces. It’s because the HTML usually looks like:

<span>
<span>
(...)
</span>
</span>

rather than:

<span><span>(...)</span></span>

which makes sense from the readability point of view. From the performance point of view, however, more text nodes naturally mean more overhead. When such redundant text nodes are combined with idle nodes, the net outcome may be that with each extra idle node, some overhead will be added.

To verify the above hypothesis, the demo similar to the above one can be used along with the above one to perform a series of experiments comparing the approach with and without redundant whitespaces: random-number-changing-in-the-tree-w-whitespaces.html?vr=0&ms=1&dv=0&ns=0. The only difference between the demos is that the w-whitespaces one creates the DOM tree with artificial whitespaces, simulating as-if it was written in the formatted document. The comparison results from the experiments run on NXP i.MX8M Plus are presented in the image below:

Overhead of redundant whitespace nodes.

As the numbers suggest, the overhead of redundant text nodes is rather small on a per-idle-node basis. However, as the number of idle nodes scales, so does the overhead. Around 100 extra idle nodes, the overhead is noticeable already. Therefore, a natural conclusion is that the redundant text nodes should rather be avoided — especially as the number of nodes in the tree becomes significant.

Parents vs siblings #

The last topic that deserves a closer look is whether adding idle nodes as siblings is better than adding them as parent nodes. In theory, having extra nodes added as siblings should be better as the layout engine will have to consider them, yet it won’t mark them with a dirty flag and hence it won’t have to layout them.

As in other cases, the above can be examined using a series of experiments run on NXP i.MX8M Plus using the demo from Desktop considerations section and comparing against either random-number-changing-before-siblings.html?vr=0&ms=1&dv=0&ns=0 or random-number-changing-after-siblings.html?vr=0&ms=1&dv=0&ns=0 demo. As both of those yield similar results, any of them can be used. The results of the comparison are depicted in the image below:

Overhead of idle nodes added as parents vs as siblings.

The experiment results corroborate the theoretical considerations made above — idle nodes added as siblings indeed introduce less layout overhead. The savings are not very large from a single idle node perspective, but once scaled enough, they are beneficial enough to justify DOM tree re-organization (if possible).

Conclusions #

The above experiments mostly emphasized the idle nodes, however, the results can be extrapolated to regular nodes in the DOM tree. With that in mind, the overall conclusion to the experiments done in the former sections is that DOM tree size and shape has a measurable impact on web application performance on embedded devices. Therefore, web developers should try to optimize it as early as possible and follow the general rules of thumb that can be derived from this article:

  1. Nodes are not free, so they should always be added with extra care.
  2. Idle nodes should be limited to ~20 on mid-end and ~10 on low-end embedded devices.
  3. Idle nodes should be inline elements, not block ones.
  4. Redundant whitespaces should be avoided — especially with idle nodes.
  5. Nodes (especially idle ones) should be added as siblings.

Although the above serves as great guidance, for better results, it’s recommended to do the proper browser benchmarking on a given target embedded device — as long as it’s feasible.

Also, the above set of rules is not recommended to follow on desktop-class devices, as in that case, it can be considered a premature optimization. Unless the particular web application yields an exceptionally large DOM tree, the gains won’t be worth the time spent optimizing.