Secure Curves in the Web Cryptography API

Introduction

Developers are exceptionally creative with the tools they are given. For a long time now they’ve had the ability to apply the Web Cryptography API to many uses. Getting random values from this API is, for example, an exceptionally popular use case being used on over 60% of page loads in the HTTP Archive dataset. Of course, it’s intended use is about actual cryptography and it offers numerous algorithms.

However, if developers feel the algorithm they need isn’t available from this API, they’ll write it (or compile to WASM) themselves. That’s the case today when it comes to “secure curve” algorithms, like X25519 [RFC7748] or Ed25519 [RFC8032] . These are desirable because they offer strong security guaranties while operating at much better performance levels than others. This is a shame because your browser already has internal support for these as part of TLS 1.3, it’s just not exposed to developers. Those userland solutions come with added costs of complexity, bandwidth, overall performance and has security implications.

Adding some Secure Curves to the Web Cryptography API would provide many advantages to web authors, but this has been a multi-year challenge that thanks to the collaboration between Igalia and Protocol Labs is close to give some results.

Context

Secure elliptic curves play a very important role in the area of cryptography, providing robust and efficient algorithms. Among the available algorithms of this kind, two curves that have gained significant attention in recent years are Ed25519 and X25519. These curves are based on the Edwards and Montgomery forms respectively, and offer strong security guaranties while still operating at excellent performance levels.

I think adding these curves to the API has been always an obvious step, but if we want to have the whole picture, we may need to step back and talk a bit about the history of the Web Cryptography API specification and why it’s has been so difficult to incorporate new and more modern algorithms in the last years.

The Web Cryptography API specification

In an effort of ensuring secure communication and data protection in the web, the W3C created the Web Cryptography Working Group which among its goals had the definition of an API that lets developers implement secure application protocols on the level of Web applications. Out of this effort the WG published the Web Cryptography API, becoming a W3C Recommendation in January 2017.

This specification defines a comprehensive set of interfaces and algorithms for performing various cryptographic tasks, such as encryption, decryption, digital signatures, key generation, and key management. As usual, one of the main goals of the W3C specs is to encourage an interoperable cryptographic API across different web browsers and platforms. This simplifies the development process and ensures compatibility and portability of web-based cryptographic applications.

There are several cryptographic algorithms defined in the Web Cryptography API, including symmetric encryption algorithms like AES, asymmetric encryption like RSA and Elliptic Curve Cryptography algorithms (ECC), hash functions like SHA-256 or digital signature algorithms like RSA-PSS. These API allows web authors to implement strong cryptographic mechanism without requiring a deep knowledge of the underlying cryptographic primitives.

It’s also important to note that the spec not only defines the cryptographic algorithms available for web applications, but also some security considerations, such as key storage and management, handling of sensitive data and protection against common security attacks. These considerations ensure that the apps implement their cryptographic logic in secure and robust way.

The adoption of the Web Cryptography API specification by major web browsers has been a key factor in enabling secure web applications and ensuring trust in only transactions.

Why it took so long to add Secure Curves

The lack of safe curves in the Web Cryptography specification has been a long-term issue for web developers that were forced to rely on third-party or native implementation for their applications. Even more when their use has been widely spread along non-web software components.

All these claims become an actual proposal when Qingsi Wang (Google) filed an issue for the TAG in the beginning of 2020. he proposal got quite positive feedback from Firefox engineers as it was clearly stated in the standard position request driven by David Baron (Mozillian back then) and Tantek Çelik, and endorsed by Martin Thomson.

So, despite the lack of a clear position from Safari, the proposal was accepted by the TAG with the support of 2 major browsers and the only concern of a proper standarization venue, given that the former Web Cryptography WG was closed a few years before. The solution to address these concerns was to develop this new specification in the Web Incubators Working Group.

The last Web Cryptography candidate recommendation was published in 2017, when it was still under the umbrella of the mentioned Web Cryptography WG. Since then, the spec drafts have been reviewed and published by the Web Application Security Working Group and with Daniel Huigens (Proton AG) as the only spec editor.

Even with this unstable situation, but with the support of 2 (Firefox and Chrome) main browsers, an intent to prototype request for Chrome was announced and the implementation started in Feb 2020. Unfortunately the work was not completed and even the partial implementation was removed from the Chromium source code repository.

After some time of maturing, the initial explainer written by Qingsi Wang was used to create the Secure Curves in the Web Cryptography API document, a potential W3C spec under the umbrella of the WIWG thanks to the work of its editor Daniel Huigens. The long-term plan is that the spec will be eventually integrated into the Web Cryptography API specification; and this is where Protocol Labs enters in the scene.

Protocol Labs contribution to the Web Crypto spec

Last year Protocol Labs defined a new goal in our long-term collaboration to get some progress on the effort to make the secure curves spec part of the Web Cryptography API. This kind of cryptography algorithm is a fundamental tool for several uses cases of the IPFS ecosystem they are trying to build during the last years.

The Ed25519 key pairs have become the standard in many web applications and the IPFS protocol has adopted them as default some time ago. Additionally, Ed25519 public keys had been primary identifiers across dat/hypercore and SSB from the beginning and most of the projects in this technology ecosystem prefer them due to the smaller key sizes and the possibility of implementing faster operations, in comparison to the use of RSA keys.

Since the adoption of UCANs by many teams inside Protocol Labs, it’s been frequent the hard choice between natively supported RSA keys in browsers versus the preferred Ed25519 keys, with the only option of relying on external libraries. The use of this external software components (eg many js / wasm ) implies a security risk of them been compromised. In most cases it is desired to have private keys non-extractable to prevent attacks from malicious scripts and/or web extensions, which can not be accomplished with js/wasm implementations; supply chain attacks is another vector that user space implementations are exposed to.

The alternatives to the lack of support of secure curves in the Web Platform has been bundling user space implementation of Ed25519 for signature verification (which increases complexity and amount of code of the programs) or the use of built-in RSA for signing (to prevent possible attacks as the ones described above).

In summary, Protocol Labs and Igalia consider that providing implementations of secure curves like Ed25519 and X25519 in the Web Cryptography API will provide to the Web Platform a very important feature that fills the gap respect to other native implementations. It will become a more competitive development platform for many projects, addressing the previously described attack vectors and in many cases simplifying applications and their implementation effort, as they will no longer require joggling Ed25519 and RSA keys.

Working plan

As I commented about, the long term goal is to get the full standardization status of the Secure Curves document and make the algorithms it defines part of the general Web Cryptography API specification. In order to achieve this goal it’s needed that most of the main browser implement the algorithms, ensuring a good level of interoperability. There are quite many Web Platform Tests for these new algorithms in the WebCryptoAPI test suite, so it’s a good start.

The nature of this goal, which I want to remark that is part of a long term and more general collaboration between Igalia and Protocol Labs, is a multi-browser task. Our plan is to implement, or collaborate with patches, spec work and tests, the Ed25519 and X25519 algorithms in Chromium, Firefox and Safari. Hence, one of the first steps has been to issue a standard position request for WebKit, which received positive feedback. This was useful to send a new intent to prototype request in Chrome, reactivating the one abandoned a few years ago.

Regarding Firefox, despite the positive feedback on the standard position request filed back in 2020, the implementation has not started yet and it’s pending on some blocking issues; I’ll elaborate on this issue in the next section.

Current status

Chrome

Our first target for this task has been the Chromium browser. Perhaps the best way to follow the progress of this work is through the Chrome Platform Status site, where there is a specific entry for this feature. If you are interested on the implementation details you can check the tracking bug.

It’s important to notice that the feature is being implemented behind the WebCryptoCurve25519 runtime flag, so if you are interested on trying it out you should enable the Experimental Web Platform Features. I’m going to talk later about what’s missing to propose the intent to ship request so that the feature could be enabled by default.

The Implementation of the Ed25519 algorithm landed Chromium in Nov 2022 and shipped in Chrome 110.0.5424.0. The X25519 key sharing algorithm took more time due to the review process, but it finally landed in March 2023 and has been shipped in Chrome since 113.0.5657.0. I can’t be more grateful to the patient and awesome work that David Benjamin (Google) did with all the reviews; contributing to the Chromium project has been always a pleasure and the review process extremely useful and agile, and this time it was not an exception.

Safari

Soon after getting positive feedback on the standard position request I filed, and in parallel to work on the implementation for Chrome, Safari engineers started the implementation of the Ed25519 algorithm for the WebKit engine. The main developer of this work has been Angela Izquierdo with reviews from Youenn Fablet mainly. Safari shipped the Ed25519 algorithm implementation in STP 163 and enabled by default for the COCOA WebKit port. I have in my TODO to enable it for the WebKitGtk+ port as well.

The implementation of the X25519 key sharing algorithm has not started yet, but I’ve been in conversation with some WebKit engineers to see how we can collaborate on this effort. Anyone interested could follow bug 258279 to track the progress of the implementation. I hope to have some time for this task during H2 this year.

Firefox

This is the browser that is more delayed regarding the implementation of the secure curves. I filed the bug 1804788 to track the implementation work and started already a preliminary analysis of the Gecko’s and NSS codebase. Unfortunately, it seems there is still some pending work (see bug 1325335 for details) to add the curve25519 cryptography primitives in the NSS library and this is blocking the Web Crypto API implementation.

We are already in conversations with some Firefox engineers and it seems there may be some progress by H2 this year as well.

Summary

The following table provides a high-level overview of the support of the secure curves25519 in some of the main browsers:

Browser	Ed25519	X25519
Chrome	✅	✅
Safari	🚀	🚧
Firefox	🚧	🚧

The following graphs show the current interoperability from wpt.fyi:

Test results for the generateKey method:

Test results for the deriveBits and deriveKey methods:

Tests results for the importKey and exportKey methods:

Tests results for the sign and verify methods:

Tests results for the wrap and unwrap methods:

Next steps

Shipping by default in Chrome

On of the top priorities for H2 is to send the intent to ship request for Chrome. There are currently 2 issues that are blocking this task:

bug 1402835 – Ensure Ed25519 and X25519 implementations matches the spec regarding small-order keys
bug 1433707 – Handling optional length in X25519 does not match spec

Regarding the the first issue, in the last draft of the Secure Curves in the Web Cryptography specification states that there must be checks for the all-zero values to ensure small-order keys are rejected (as per RFC7748 Section 6.1).

If secret is the all-zero value, then throw a OperationError. This check must be performed in constant-time, as per [RFC7748] Section 6.1″

However, there is an ongoing discussion in the PR#13 to introduce a change so that the small-order keys are rejected during the import operation instead of when they are used. It’s worth mentioning the strong opposition from Chrome to this spec change, under the argument of following the RFC 7748 where it’s stated to do the checks when the keys are used and considers this PR a regression. There is also an ongoing discussion about this in WebKit in the form of a new standard-position request, but still no feedback on this side.

There are WPT to ensure that the X25519 algorithm works as expected with small-order keys, but since they assume that the all-zero checks are performed at the derivation phase, there are asserts to ensure the initial keys are valid. If the spec changes, these tests must be adapted.

Regarding the second issue, there is an active discussion in the issue#322 where despite the different positions about the best approach to address it, there is a clear consensus that the Web Cryptography API spec has several inconsistencies on how the deriveBit function’s ‘length’ parameter is defined. These inconsistencies have lead to wrong WPT definitions and possibly some browser’s implementations that would beed to be changed. Although there is a clear lack of interoperability here, the most concerning issue is the correctness of the implementations and how any potential change may affect to the deriveKey operations of ECDH, HFDF and PBKDF2 algorithms.

WebKit’s implementation of X25519

As I said before, we are currently analyzing the WebKit’s codebase to see if we could have some resources to start the implementation early in H2.

Firefox’s implementation of both Ed25519 and X25519

Until there is support in Firefox’s NSS component for the Curve25519 cryptographic primitives we are not able to start with the implementation of the Web Cryptography API for these algorithms.

Conclusions

The work that Igalia and Protocol Labs are doing in the Web Cryptography API specification will have a big impact on how web developers use the platform these days, reducing security risks and allowing lighter and simpler applications.

We are working very hard to offer web authors native support for the Ed255129 and X25519 in the main browsers (Safari, Firefox, Chrome) by the end of 2033, including all the Chromium based browsers (eg, Edge, Brave, Opera).

This work is another example of the Protocol Labs’s commitment with an open Web Platform and open source browsers, investing their resources on a great variety of features with wide impact on web authors.

Discovering Chrome’s pre-defined
Custom Handlers

In a previous post I described some architectural changes I’ve applied in Chrome to move the Custom Handlers logic into a new component. One of the advantages of this new architecture is an easier extensibility for embedders and less friction with the //chrome layer.

Igalia and Protocol Labs has been collaborating for some years already to improve the multi-protocol capabilities in the main browsers; most of our work has been done in Chrome, but we have interesting contributions in Firefox and of course we have plans to increase or activity in Safari.

As I said in the mentioned post, one of the long-term goals that Protocol Labs has with this work is to improve the integration of the IPFS protocol in the main browsers. On this regard, its undeniable that the Brave browser is the one leading this effort, providing a native implementation of the IPFS protocol that can work with both, a local IPFS node and the HTTPS gateway. For the rest of the browsers, Protocol Labs would like to at least implement the HTTPS gateway support, as an intermediate step and to increase the adoption and use of the protocol.

This is where the Chrome’s pre-defined custom handlers feature could provide us a possible approach. This feature allows Chrome to register handlers for specific schemes at compile-time. The ChromeOS embedders use this mechanism to register handlers for the mailto and webcal schemes, to redirect the HTTP requests to mail.google.com and google.com/calendar sites respectively.

The idea would be that Chrome could use the same approach to register pre-defined handlers for the IPFS scheme, redirecting the IPFS request to the HTTPS gateways.

In order to understand better how this feature would work, I\’ll explain first how Chrome deals with the needs of the different embedders regarding the schemes.

The SchemesRegistry

The SchemeRegistry data structure is used to declare the schemes registered in the browser and assign some specific properties to different subsets of such group of schemes.

The data structure defines different lists to provide these specific features to the schemes; the main and more basic list is perhaps the standard schemes, which is defined as follows:

A standard-format scheme adheres to what RFC 3986 calls “generic URI syntax” (https://tools.ietf.org/html/rfc3986#section-3).

There are other lists to lend specific properties to a well-defined set of schemes:

// Schemes that are allowed for referrers.
std::vector<std::string> referrer_schemes = {
  {kHttpsScheme, SCHEME_WITH_HOST_PORT_AND_USER_INFORMATION},
  {kHttpScheme, SCHEME_WITH_HOST_PORT_AND_USER_INFORMATION},
};
// Schemes that do not trigger mixed content warning.
std::vector<std::string> secure_schemes = {
  kHttpsScheme, kAboutScheme, kDataScheme, 
  kQuicTransportScheme, kWssScheme,
};
</std::string></std::string>

The following class diagram shows how the SchemeRegistry data structure is defined, with some duplication to avoid layering violations, and its relationship with the Content Public API, used by the Chrome embedders to define its own behavior for some schemes

It’s worth mentioning that this relationship between the //url, //content and //blink layers shows some margin to be improved, as it was noted down by Dimitry Gozman (one of the Content Public API owners) in one of the reviews of the patches I’ve been working on. I hope we could propose some improvements on this code, as part of the goals that Igalia and Protocol Labs will define for 2023.

Additional schemes for Embedders

As it was commented on the preliminary discussions I had as part of the analysis of the initial refactoring work described in the first section, Chrome already provides a way for embedders to define new schemes and even modifying the behavior of the ones already registered. This is done via the Content Public API, as usual.

The Chrome’s Content Layer offers a Public API that embedders can implement to define its own behavior for certain features.

One of these abstract features is the addition of new schemes to be handled internally. The ContentClient interface has a method called AddAdditionalSchemes precisely for this purpose. On the other hand, the ContentBrowserClient interface provides the HasCustomSchemeHandler and IsHandledURL methods to allow embedders implement their specific behavior to handler such schemes.

The following class diagram shows how embedders implements the Content Client interfaces to provide specific behavior to some schemes:

I’ve applied a refactoring to allow defining pre-defined handlers via the SchemeRegistry. This is some excerpt of such refactoring:

// Schemes with a predefined default custom handler.
std::map<std::string, std::string=""> predefined_handler_schemes;
 
void AddPredefinedHandlerScheme(const char* new_scheme, const char* handler) {
  DoAddSchemeWithHandler(
      new_scheme, handler,      
      GetSchemeRegistryWithoutLocking().predefined_handler_schemes);
}
 
const std::map<std::string, std::string="">
GetPredefinedHandlerSchemes() {
  return GetSchemeRegistry().predefined_handler_schemes;
}
</std::string,></std::string,>

Finally, embedders would just need to implement the ChromeContentClient::AddAdditionalSchemes interface to register a scheme and its associated handler.

For instance, in order to install in Android a predefined handler for the IPFS protocol, we would just need to add in the AwContentClient::AddAdditionalSchemes function the following logic:

 schemes.predefined_handler_schemes.emplace_back(
      "ipfs", "https://dweb.link/ipfs/?uri=%s");
  schemes.predefined_handler_schemes.emplace_back(
      "ipns", "https://dweb.link/ipns/?uri=%s");

Conclusion

The pre-defined handlers allow us to introduce in the browser a custom behavior for specific schemes.

Unlike the registerProtocolHandler method, we could bypass some if the restrictions imposed by the HTML API, given that in the context of an embedder browser its possible to have more control on the navigation use cases.

The custom handlers, either using the pre-defined handlers approach or the registerProtocolHandler method, is not the ideal way of introducing multi-protocol support in the browser. I personally consider it a promising intermediate step, but a more solid solution must be designed for these use cases of the browser.

New Custom Handlers component for Chrome

The HTML Standard section on Custom Handlers describes the procedure to register custom protocol handlers for URLs with specific schemes. When a URL is followed, the protocol dictates which part of the system handles it by consulting registries. In some cases, handlers may fall to the OS level and and launch a desktop application (eg. mailto:// may launch a native application) or, the browser may redirect the request to a different HTTP(S) URL (eg. mailto:// can go to gmail).

There are multiple ways to invoke the registerProtocolHandler method from the HTML API. The most common is through Browser Extensions, or add-ons, where the user must install third-party software to add or modify a browser’s feature. However, this approach puts on the users the responsibility of ensuring the extension is secure and that it respects their privacy. Ideally, downstream browsers could ship built-in protocol handlers that take this load off of users, but this was previously difficult.

During the last years Igalia and Protocol Labs have been working on improving the support of this HTML API in several of the main web engines, with the long term goal of getting the IPFS protocol a first-class member inside the most relevant browsers.

In this post I’m going to explain the most recent improvements, especially in Chrome, and some side advantages for Chromium embedders that we got on the way.

Componentization of the Custom Handler logic

The first challenge faced by Chromium-based browsers wishing to add support for a new protocol was the architecture. Most of the logic and relevant codebase lived in the //chrome layer. Thus, any intent to introduce a behavior change on this logic would require a direct fork and patch the //chrome layers with the new logic. The design wasn’t defined to allow Chrome embedders to extend it with new features

The Chromium project provides a Public Content API to allow embedders reusing a great part of the browser’s common logic and to implement, if needed, specific behavior though the specialization of these APIs. These would seem to be the ideal way of extending the browser capabilities to handle new protocols. However, accessing //chrome from any layer (eg. //content or //components) is forbidden and constitutes a layering violation.

After some discussion with Chrome engineers (special thanks to Kuniko Yasuda and Colin Blundell) we decided to move the Protocol Handlers logic to the //components layer and create a new Custom Handlers component.

The following diagram provides a high-level overview of the architectural change:

Custom Handlers logic moved to the //component layer — Changes in the Chrome’s component design

The new component makes the Protocol Handler logic Chrome independent, bringing us the possibility of using the Content Public API to introduce the mentioned behavior changes, as we’ll see later in this post.

Only the registry’s Delegate and the Factory classes will remain in the //chrome layer.

The Delegate class was defined to provide OS integration, like setting as default browser, which needs to access the underlying operating systems interfaces. It also deals with other browser-specific logic, like Profiles. With the new componentized design, embedders may provide their own Delegate implementations with different strategies to interact with the operating system’s desktop or even support additional flavors.

On the other hand, the Factory (in addition to the creation of the appropriated Delegate’s instance) provides a BrowserContext that allows the registry to operate under Incognito mode.

Lets now get a deeper look into the component, trying to understand the responsibilities of each class. The following class diagram illustrates the multi-process architecture of the Custom Handler logic:

Class diagram of the registerProtocolHandler logic in Chrome's multi-process architecture — Multi-process architecture class diagram

The ProtocolHandlerRegistry class has been modified to allow operating without user preferences storage; this is particularly useful to move most of the browser tests to the new component and avoid the dependency with the prefs and user_prefs components (eg. testing). In this scenario the registered handlers will be stored in memory only.

It’s worth mentioning that as part of this architectural change, I’ve applied some code cleanup and refactoring, following the policies dictated in the base::Value Code Health task-force. Some examples:

Use of StringPiece (CL#3445540)
Get rid of DictionaryValue and ListValue (CL#3484122)

The ProtocolHandlerThrottle implements URL redirection with the handler provided by the registry. The ChromeContentBrowserClient and ShellBrowserContentClient classes use it to implement the CreateURLLoaderThrottles method of the ContentBrowserClient interface.

The specialized request is the responsible for handling the permission prompt dialog response, except for the the content-shell implementations, which bypass the Permission APIs (more on this later). I’ve also been working on a new design that relies on the PermissionContextBase::CreatePermissionRequest interface, so that we could create our own custom request and avoid the direct call to the PermissionRequestManager. Unfortunately this still needs some additional support to deal with the RegisterProtocolHandlerPermissionRequest‘s specific parameters needed for creating new instances, but I won’t extend on this, as it deserves its own post.

Finally, I’d like to remark that the new component provides a simple registry factory, designed mainly for testing purposes. It’s basically the same as the one implemented in the //chrome layer except some limitations of the new component:

No incognito mode
Use of a mocking Delegate (no actual OS interaction)

Single-Point security checks

I also wanted to apply some refactoring as well to improve the inter-process consistency and increase the amount of shared code between the renderer and browser process.

One of the most relevant parts of this refactoring was the one applied to the implement the Custom Handlers’ normalization process, which goes from some URL’s syntax validation to the security and privacy checks. I’ve landed some patches (CL#3507332, CL#3692750) to implement functions that are shared among the renderer and browser processes to implement the mentioned normalization procedure.

Refactoring to implement a single-point security and privacy checks — Security and privacy logic refactoring

The code has been moved to blink/common so that it could be invoked also by the new Custom Handlers component described before. Both the WebContentsImpl (run by the Browser process) and the NavigatorContentUtils (by the Renderer process) rely on this common code now.

Additionally, I have moved all the security and privacy checks from the Browser class, in Chrome, to the WebContentsImpl class. This implies that any embedder that implements the WebContentsDelegate interface shouldn’t need to bother about these checks, assuming that any ProtocolHandler instance is valid.

Conclusion

In the introduction I mentioned that the main goal of this refactoring was to decouple the Custom Handlers logic from the Chrome’s specific codebase; this would allow us to modify or even extend the implementation of the Custom Handlers APIs.

The componentization of some parts of the Chrome’s codebase has many advantages, as it’s described in the Browser Components design document. Additionally, a new Custom Handlers component would provide other interesting advantages on different fronts.

One of the most interesting use cases is the definition of Web Platform Tests for the Custom Handlers APIs. This goal has 2 main challenges to address:

Testing Automation
Content Shell support

Finally, we would like this new component to be useful for Chrome embedders, and even browsers built directly on top of a full featured Chrome, to implement new Custom Handlers related features. This line of work would eventually be useful to improve the support of distributed protocols like IPFS, as an intermediate step towards a native implementation of the protocols in the browser.

Improving CSS Custom Properties performance

Chrome 84 reached the stable channel a few weeks ago, and there are already several great posts describing the many important additions, interesting new features, security fixes and improvements in privacy policies (([1], [2], [3], [4]) it contains. However, there is a change that I worked on in this release which might have passed unnoticed by most, but I think is very valuable: A change regarding CSS Custom Properties (variables) performance.

The design of CSS, in general, takes great care in considering how features are designed with respect to making it possible for them to perform well. However, implementations may not perform as well as they could, and it takes a considerable amount of time to understand how authors use the features and which cases are more relevant for them.

CSS Custom Properties are an interesting example to look at here: They are a wonderful feature that provides a lot of advantages for web authors. For a whole lot of cases, all of the implementations of CSS Custom Properties perform well enough that most people won’t notice. However, we at Igalia have been analyzing several use cases and looking at some reports around their performance in different implementations.

Let’s consider a fairly straightforward example in which an author sets a single property in a toggleable class in the body, and then uses that property several times deeper in the tree to change the foreground color of some text.

<style>
   .red { --prop: red; }
   .green { --prop: green; }
</style>
 
 
 
<div>
 
<div>
 
<div>
 
<div>
 
<div style="color: var(--prop)"></div>
 
</div>
 
</div>
 
</div>
 
</div>
 
<!-- repeat the above subtree N times -->

Only about 20% of those actually use this property, 5 elements deep into the tree, and only to change the foreground color.

To evaluate Chromium’s performance in a case like this we can define a new perf tests, using the perf tools the Chromium project has available for browser engineers. In this case, we want a huge tree so that we can evaluate better the impact of the different optimizations.

<style>
    .green { --prop: green; }
    .red { --prop: red; }
</style>
 
    <script>
        function createDOMTree() {
            let div = document.createElement('div');
            div.innerHTML = '<div><div><div><div><div style="color: var(--prop)">TEXT</div></div></div></div></div>';
            for (let i = 0; i < 10000; i++) {
                document.body.appendChild(div.cloneNode(true));
            }
        }
        createDOMTree();
        var theme;
        PerfTestRunner.measureTime({
            description: "Measures the performance in the propagation of a custom property declaration.",
            setup: () => {
                document.body.classList.remove(theme);
                theme = theme == 'green' ? 'red' : 'green';
            },
            run: function() {
                document.body.classList.add(theme);
                forceStyleRecalc(document.body);
            },
        });
    </script>

These are the results obtained runing the test in Chrome 83:

avg	median	stdev	min	max
163.74 ms	163.79 ms	3.69 ms	158.59 ms	163.74 ms

I admit that it’s difficult to evaluate the results, especially considering the number of nodes of such a huge DOM tree. Lets compare the results of the same test on Firefox, using different number of nodes.

Nodes	50K	20K	10K	5K	1K	500
Chrome 83	163.74 ms	55.05 ms	25.12 ms	14.18 ms	2.74 ms	1.50 ms
FF 78	28.35 ms	12.05 ms	6.10 ms	3.50 ms	1.15 ms	0.55 ms
	1/6	1/5	1/4	1/4	1/2	1/3

As I commented before, the data are more accurate when the DOM tree has a lot of nodes; in any case, the difference is quite clear and shows there is plenty room for improvement. WebKit based browsers have results more similar to Chromium as well.

Performance tests like the one above can be added to browsers for tracking improvements and regressions over time, so we’ve added (r763335) that to Chromium’s tree: We’d like to see it get faster over time, and definitely cannot afford regressions (see Chrome Performance Dashboard and the ChangeStyleCustomPropertyDeclaration test for details) .

So… What can we do?

In Chrome 83 and lower, whenever the custom property declaration changed, the new declaration would be inherited by the whole tree. This inheritance implied executing the whole CSS cascade and recalculating the styles of all the nodes in the entire tree, since with this approach, all nodes may be affected.

Chrome had already implemented an optimization on the CSS cascade implementation for regular CSS properties that don’t depend on any other to resolve their value. These subset of CSS properties are defined as Independent Properties in the Chromium codebase. The optimization mentioned before affects how the inheritance mechanism is implemented for these Independent properties. Whenever one of these properties changes, instead of recalculating the styles of the inherited properties, children can just copy the whole parent’s computed style. Blink’s style engine has a component known as Matched Properties Cache responsible of deciding when is possible to avoid the style resolution of an element and instead, performing an efficient copy of the matched computed style. I’ll get back to this concept in the last part of this post.

In the case of CSS Custom Properties, we could apply a similar approach as a good step. We can consider that the nodes with computed styles that don’t have references to custom properties declarations shouldn’t be affected by the new declaration, and we can implement the inheritance directly by copying the parent’s computed style. The patch with the optimization I’ve implemented in r765278 initially landed in Chrome 84.0.4137.0

Let’s look at the result of this one action in the Chrome Performance Dashboard:

That’s a really good improvement!

However, it’s also just a first step. It’s clear that Chrome still has a wide margin for improvement in this case, as well any WebKit based browser – Firefox is still, impressively, markedly faster as it’s been described in the bug report filed to track this issue. The following table shows the result of the different browsers together; even disabling the muti-thread capabilities of Firefox’s Stylo engine (STYLO_THREAD=1), FF is much faster than Chrome with the optimization applied.

	Chrome 83	Chrome 84	FF 78	FF 78 th=1
avg median stdev min max	163.74 ms 163.79 ms 3.69 ms 158.59 ms 163.74 ms	117.37 ms 117.52 ms 1.98 ms 113.66 ms 120.87 ms	28.35 ms 28.50 ms 0.93 ms 26.00 ms 30.00 ms	38.25 ms 38.50 ms 1.86 ms 35.00 ms 41.00 ms

Before continue, I want get back to the Matched Properties Cache (MPC) concept, since it has an important role on these style optimizations. This cache is not a new concept in the Chrome’s engine; as a matter of fact, it’s also used in WebKit, since it was implemented long ago, before the fork that created the new blink engine. However, Google has been working a lot on this area in the last years and some of the most recent changes in the MPC have had an important impact on style resolution performance. As a result of this work, elements with independent and non-independent properties using CSS Variables might produce cache hits in the MPC. The results of the Performance Dashboard show a considerable improvement in the mentioned ChangeStyleCustomPropertyDeclaration test (avg: 108.06 ms)

Additionally, there are several other cases where the use of CSS Variables has a considerable impact on performance, compared with using regular CSS properties. Obviously, resolving CSS Variables has a cost, so it’s clear that we could apply additional optimizations that reduce the impact of the variable resolution, especially for handling specific style changes that might not affect to a substantial portion of the DOM tree. I’ve been experimenting with the MPC to explore the idea an independent CSS Custom Properties cache; nodes with variables referencing the same custom property will produce cache hits in the MPC, even though other properties don’t match. The preliminary approach I’ve been implementing consists on a new matching function, specific for custom properties, and a mechanism to transfer/copy the property’s data to avoid resolving the variable again, since the property’s declaration hasn’t change. We would need to apply the css cascade again, but at least we could save the cost of the variable resolution.

Of course, at the end of the day, improving performance has costs and challenges – and it’s hard to keep performance even once you get it. Bit if we really want performant CSS Custom Properties, this means that we have to decide to prioritize this work. Currently there is reluctance to explore the concept of a new Custom Properties specific cache – the challenge is big and the risks are not non-existent; cache invalidation can get complicated. But, the point is that we have to understand that we aren’t all going to agree what is important enough to warrant attention, or how much investment, or when. Web authors must convince vendors that these use cases are worth being optimized and that the cost and risks of such a complex challenges should be assumed by them.

This work has been sponsored by Bloomberg, which I consider one of the most important contributors of the Web Platform. After several years, the vision of this company and its responsibility as consumer of the platform has lead to many and important contributions that we all enjoy now. Although CSS Grid Layout might be the most remarkable one, there are may other not that big, like this work on CSS Custom Properties, or several other new features of the CSS Text specification. This is a perfect example of an company that tries to change priorities and adapt the web platform to its needs and the use cases they consider more aligned with their business strategy.

I understand that not every user of the web platform can do this kind of investment. This is why I believe that initiatives like Open Priorization could help to move the web platform in a positive direction. By providing a way for us to move past a lot of these conversation and focus on the needs that some web authors and users of the platform consider more important, or higher priority. Improving performance for CSS Custom Properties isn’t currently one of the projects we’ve listed, but perhaps it would be an interesting one we might try in the future if we are successful with these. If you haven’t already, have a look and see if there is something there that is interesting to you or your company – pledges of any size are good – ten thousand $1 donations are every bit as good as ten $1000 donations. Together, we can make a difference, and we all benefit.

Also, we would love to hear about your ideas. Is improving CSS Custom Properties performance important to you? What else is? Share your comments with us on Twitter, either me (@lajava77) or our developer advocate Brian Kardell (@briankardell), or email me at jfernandez@igalia.com. I’d be glad to answer any question about the Open Priorization experiment.

A new terminal-style line breaking with CSS Text

The CSS Text 3 specification defines a module for text manipulation and covers, among a few other features, the line breaking behavior of the browser, including white space handling. I’ve been working lately on some new features and bug fixing for this specification and I’d like to introduce in this posts the last one we made available for the Web Platform users. This is yet another contribution that came out the collaboration between Igalia and Bloomberg, which has been held for several years now and has produced many important new features for the Web, like CSS Grid Layout.

The feature

I guess everybody knows the white-space CSS property, which allows web authors to control two main aspects of the rendering of a text line: collapsing and wrapping. A new value break-spaces has been added to the ones available for this property, which allows web authors to emulate a terminal-like line breaking behavior. This new value operates basically like pre-wrap, but with two key differences:

any sequence of preserved white space characters takes up space, even at the end of the line.
a preserved white space sequence can be wrapped at any character, moving the rest of the sequence, intact, to the line bellow.

What does this new behavior actually mean ? I’ll try to explain it with a few examples. Lets start with a simple but quite illustrative demo which tries to emulate a meteorology monitoring system which shows relevant changes over time, where the gaps between subsequent changes must be preserved:

<style>
 #terminal {
  font: 20px/1 monospace;
  width: 340px;
  height: 5ch;
  background: black;
  color: green;
  overflow: hidden;
  white-space: break-spaces;
  word-break: break-all;
 }
</style>
 
 
 
 
 
 
<div id="terminal"></div>

Another interesting use case for this feature could be a logging system which should preserve the text formatting of the logged information, considering different window sizes. The following demo tries to describe this such scenario:

<style>
body { width: 1300px; }
#logging {
  font: 20px/1 monospace;
  background: black;
  color: green;
 
  animation: resize 7s infinite alternate;
 
  white-space: break-spaces;
  word-break: break-all;
}
@keyframes resize {
  0% { width: 25%; }
  100% { width: 100%; }
}
</style>
 
 
 
 
 
<div id="logging">
Hash: 5a2a3d23f88174970ed8
Version: webpack 3.12.0
Time: 22209ms
                                         Asset       Size  Chunks                    Chunk Names
   pages/widgets/index.51838abe9967a9e0b5ff.js    1.17 kB      10  [emitted]         pages/widgets/index
                       img/icomoon.7f1da5a.svg    5.38 kB          [emitted]         
                     fonts/icomoon.2d429d6.ttf    2.41 kB          [emitted]         
           img/fontawesome-webfont.912ec66.svg     444 kB          [emitted]  [big]  
         fonts/fontawesome-webfont.b06871f.ttf     166 kB          [emitted]         
                        img/mobile.8891a7c.png    39.6 kB          [emitted]         
                   img/play_button.6b15900.png    14.8 kB          [emitted]         
                  img/keyword-back.f95e10a.jpg    43.4 kB          [emitted]         
 
.
.
.
</div>

Use cases

In the demo shown before there are several cases that I think it’s worth to analyze in detail.

A breaking opportunity exists after any white space character

The main purpose of this feature is to preserve the white space sequences length even when it has to be wrapped into multiple lines. The following example tries to describe this basic use case:

<style>
.container {
  font: 20px/1 monospace;
  width: 5ch;
  white-space: break-spaces;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container">XX               XX</div>

The example above shows how the white space sequence with a length of 15 characters is preserved and wrapped along 3 different lines.

Single leading white space

Before the addition of the break-spaces value this scenario was only possible at the beginning of the line. In any other case, the trailing white spaces were either collapsed or hang, hence the next line couldn’t start with a sequence of white spaces. Lets consider the following example:

<style>
.container {
  font: 20px/1 monospace;
  width: 3ch;
  white-space: break-spaces;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container"> XX  XX</div>

Like when using pre-wrap, the single leading space is preserved. Since break-spaces allows breaking opportunities after any white space character, we break after the first leading white space (” |XX XX”). The second line can be broken after the first preserved white space, creating another leading white space in the next line (” |XX | XX”).

However, lets consider now a case without such first single leading white space.

<style>
.container {
  font: 20px/1 monospace;
  width: 3ch;
  white-space: break-spaces;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container">XXX  XX</div>

Again, it s not allowed to break before the first space, but in this case there isn’t any previous breaking opportunity, so the first space after the word XX should overflow (“XXX | XX”); the next white space character will be moved down to the next line as preserved leading space.

Breaking before the first white space

I mentioned before that the spec states clearly that the break-space feature allows breaking opportunities only after white space characters. However, it’d be possible to break the line just before the first white space character after a word if the feature is used in combination with other line breaking CSS properties, like word-break or overflow-wrap (and other properties too).

<style>
.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container">XXXX  X</div>

The two white spaces between the words are preserved due to the break-spaces feature, but the first space after the XXXX word would overflow. Hence, the overflow-wrap: break-word feature is applied to prevent the line to overflow and introduce an additional breaking opportunity just before the first space after the word. This behavior causes that the trailing spaces are moved down as a leading white space sequence in the next line.

We would get the same rendering if word-break: break-all is used instead overflow-wrap (or even in combination), but this is actualy an incorrect behavior, which has the corresponding bug reports in WebKit (197277) and Blink (952254) according to the discussion in the CSS WG (see issue #3701).

Consider previous breaking opportunities

In the previous example I described a combination of line breaking features that would allow breaking before the first space after a word. However, this should be avoided if there are previous breaking opportunities. The following example is one of the possible scenarios where this may happen:

<style>
.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container">XX X X</div>

In this case, we could break after the second word (“XX X| X”), since overflow-wrap: break-word would allow us to do that in order to avoid the line to overflow due to the following white space. However, white-space: break-spaces only allows breaking opportunities after a space character, hence, we shouldn’t break before if there are valid previous opportunities, like in this case in the space after the first word (“XX |X X”).

This preference for previous breaking opportunities before breaking the word, honoring the overflow-wrap property, is also part of the behavior defined for the white-space: pre-wrap feature; although in that case, there is no need to deal with the issue of breaking before the first space after a word since trailing space will just hang. The following example uses just the pre-wrap to show how previous opportunities are selected to avoid overflow or breaking a word (unless explicitly requested by word-break property).

<style>
.container {
  font: 20px/1 monospace;
  width: 2ch;
  white-space: pre-wrap;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container"> XX</div>

overflow-wrap:
break-word

word-break:
break-all

In this case, break-all enables breaking opportunities that are not available otherwise (we can break a word at any letter), which can be used to prevent the line to overflow; hence, the overflow-wrap property doesn’t take any effect. The existence of previous opportunities is not considered now, since break-all mandates to produce the longer line as possible.

This new white-space: break-spaces feature implies a different behavior when used in combination with break-all. Even though the preference of previous opportunities should be ignored if we use the word-break: break-all, this may not be the case for the breaking before the first space after a word scenario. Lets consider the same example but using now the word-break: break-all feature:

<style>
.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  word-break: break-all;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container">XX X X</div>

The example above shows that using word-break: break-all doesn’t produce any effect. It’s debatable whether the use of break-all should force the selection of the breaking opportunity that produces the longest line, like it happened in the pre-wrap case described before. However, the spec states clearly that break-spaces should only allow breaking opportunities after white space characters. Hence, I considered that breaking before the first space should only happen if there is no other choice.

As a matter of fact, specifying break-all we shouldn’t considering only previous white spaces, to avoid breaking before the first white space after a word; the break-all feature creates additional breaking opportunities, indeed, since it allows to break the word at any character. Since break-all is intended to produce the longest line as possible, this new breaking opportunity should be chosen over any previous white space. See the following test case to get a clearer idea of this scenario:

<style>
.container {
  font: 20px/1 monospace;
  width: 4ch;
  white-space: break-spaces;
  overflow-wrap: break-word;
  word-break: break-all;
  border: 1px solid;
}
</style>
 
 
 
 
 
<div class="container">X XX X</div>

Bear in mind that the expected rendering in the above example may not be obtained if your browser’s version is still affected by the bugs 197277(Safari/WebKit) and 952254(Chrome/Blink). In this case, the word is broken despite the opportunity in the previous white space, and also avoiding breaking after the ‘XX’ word, just before the white space.

There is an exception to the rule of avoiding breaking before the first white space after a word if there are previous opportunities, and it’s precisely the behavior the line-break: anywhere feature would provide. As I said, all these assumptions were not, in my opinion, clearly defined in the current spec, so that’s why I filed an issue for the CSS WG so that we can clarify when it’s allowed to break before the first space.

Current status and support

The intent-to-ship request for Chrome has been approved recently, so I’m confident the feature will be enabled by default in Chrome 76. However, it’s possible to try the feature in older versions by enabling the Experimental Web Platform Features flag. More details in the corresponding Chrome Status entry. I want to highlight that I also implemented the feature for LayoutNG, the new layout engine that Chrome will eventually ship; this achievement is very important to ensure the stability of the feature in future versions of Chrome.

In the case of Safari, the patch with the implementation of the feature landed in the WebKit’s trunk in r244036, but since Apple doesn’t announce publicly when a new release of Safari will happen or which features it’ll ship, it’s hard to guess when the break-spaces feature will be available for the web authors using such browser. Meanwhile, It’s possible to try the feature in the Safari Technology Preview 80.

Finally, while I haven’t see any signal of active development in Firefox, some of the Mozilla developers working on this area of the Gecko engine have shown public support for the feature.

The following table summarizes the support of the break-spaces feature in the 3 main browsers:

	Chrome	Safari	Firefox
Experimental	M73	STP 80	Public support
Ship	M76	Unknown	Unknown

Web Platform Tests

At Igalia we believe that the Web Platform Tests project is a key piece to ensure the compatibility and interoperability of any development on the Web Platform. That’s why a substantial part of my work to implement this relatively small feature was the definition of enough tests to cover the new functionality and basic use cases of the feature.

white-space	overflow-wrap	word-break
pre-wrap-008 pre-wrap-015 pre-wrap-016 break-spaces-003 break-spaces-004 break-spaces-005 break-spaces-006 break-spaces-007 break-spaces-008 break-spaces-009	break-word-004 break-word-005 break-word-006 break-word-007 break-word-008	break-all-010 break-all-011 break-all-012 break-all-013 break-all-014 break-all-015

Implementation in several web engines

During the implementation of a browser feature, even a small one like this, it’s quite usual to find out bugs and interoperability issues. Even though this may slow down the implementation of the feature, it’s also a source of additional Web Platform tests and it may contribute to the robustness of the feature itself and the related CSS properties and values. That’s why I decided to implement the feature in parallel for WebKit (Safari) and Blink (Chrome) engines, which I think it helped to ensure interoperability and code maturity. This approach also helped to get a deeper understanding of the line breaking logic and its design and implementation in different web engines.

I think it’s worth mentioning some of these code architectural differences, to get a better understanding of the work and challenges this feature required until it reached web author’s browser.

Chrome/Blink engine

Lets start with Chrome/Blink, which was especially challenging due to the fact that Blink is implementing a new layout engine (LayoutNG). The implementation for the legacy layout engine was the first step, since it ensures the feature will arrive earlier, even behind an experimental runtime flag.

The legacy layout relies on the BreakingContext class to implement the line breaking logic for the inline layout operations. It has the main characteristic of handling the white space breaking opportunities by its own, instead of using the TextBreakIterator (based on ICU libraries), as it does for determining breaking opportunities between letters and/or symbols. This design implies too much complexity to do even small changes like this, especially because is very sensible in terms of performance impact. In the following diagram I try to show a simplified view of the classes involved and the interactions implemented by this line breaking logic.

The LayoutNG line breaking logic is based on a new concept of fragments, mainly handled by the NGLineBreaker class. This new design simplifies the line breaking logic considerably and it’s highly optimized and adapted to get the most of the TextBreakIterator classes and the ICU features. I tried to show a simplified view of this new design with the following diagram:

In order to describe the work done to implement the feature for this web engine, I’ll list the main bugs and patches landed during this time: CR#956465, CR#952254, CR#944063,CR#900727, CR#767634, CR#922437

Safari/WebKit engine

Although as time passes this is less probable, WebKit and Blink still share some of the layout logic from the ages prior to the fork. Although Blink engineers have applied important changes to the inline layout logic, both code refactoring and optimizations, there are common design patterns that made relatively easy porting to WebKit the patches that implemented the feature for the Blink’s legacy layout. In WebKit, the line breaking logic is also implemented by the BreakingContext class and it has a similar architecture, as it’s described, in a quite simplified way, in the class diagram above (it uses different class names for the render/layout objects, though) .

However, Safari supports for the mac and iOS platforms a different code path for the line breaking logic, implemented in the SimpleLineLayout class. This class provides a different design for the line breaking logic, and, similar to what Blink implements in LayoutNG, is based on a concept of text fragments. It also relies as much as possible into the TextBreakIterator, instead of implementing complex rules to handle white spaces and breaking opportunities. The following diagrams show this alternate design to implement the line breaking process.

This SimpleLineLayout code path in not supported by other WebKit ports (like WebKitGtk+ or WPE) and it’s not available either when using some CSS Text features or specific fonts. There are other limitations to use this SimpleLineLayout codepath, which may lead to render the text using the BreakingContext class.

Again, this is the list of bugs that were solved to implement the feature for the WebKit engine: WK#197277, WK#196169, WK#196353, WK#195361, WK#177327, WK#197278

Conclusion

I hope that at this point these 2 facts are clear now:

The white-space: break-spaces feature is a very simple but powerful feature that provides a new line breaking behavior, based on unix-terminal systems.
Although it’s a simple feature, on the paper (spec), it implies a considerable amount of work so that it reaches the browser and it’s available for web authors.

In this post I tried to explain in a simple way the main purpose of this new feature and also some interesting corner cases and combinations with other Line Breaking features. The demos I used shown 2 different use cases of this feature, but there are may more. I’m sure the creativity of web authors will push the feature to the limits; by then, I’ll be happy to answer doubts, about the spec or the implementation for the web engines, and of course fix the bugs that may appear once the feature is more used.

Igalia and Bloomberg working together to build a better web

Finally, I want to thank Bloomberg for supporting the work to implement this feature. It’s another example of how non-browser vendors can influence the Web Platform and contribute with actual features that will be eventually available for web authors. This is the kind of vision that we need if we want to keep a healthy, open and independent Web Platform.

CSS Grid on LayoutNG: a Web Engines Hackfest story

I had the pleasure to attend the Web Engines Hackfest last week, hosted and organized by Igalia in its HQ in A Coruña. I’m really proud of what we are achieving with this event, thank so much to everybody involved in the organization but, specially, to the people attending. This year we had a lot of talent in a single place, hacking and sharing their expertise on the Web Platform; I really think we all have pushed the Web Platform forward during these days.

As you may already know, I’m part of the Web Platform team at Igalia working on the implementation of the CSS Grid Layout feature for Blink and WebKit web engines. This work has been sponsored by Bloomberg, as part of the collaboration we started several years ago to improve the Web Platform in many areas, from JS (V8, JSC and even ChakraCore) to different modules of the layout engine (eg. CSS features, editing/selection).

Since the day I received the invitation to attend the hackfest I knew that one of the tasks I wanted to hack on was the implementation of the CSS Grid feature in the new Chromium’s layout engine (still experimental), known as LayoutNG. Having Chistinan Biesinger, one of the Google engineers working on LayoutNG, here during 3 days was so good to let pass by the oportunity to, at least, start this task. I asked him to give a lighting talk during the layout breakout session about the current status of the LayoutNG project, a brief explanation of some of the most relevant details of its logic and its advantages with the current layout.

Layout Breakout Session

A small group of people interested on layout met to discuss about the future of layout in different web engines. I attended with some folks representing Igalia, and other people from Mozilla, Google, WebKit and ARM.

Christian described the key parts of the new LayoutNG, which provides a simpler code and generally better performance (although results are still preliminary since its under development phase). The concept of fragments gained relevance in this new layout model and currently, inline and block layout is basically complete; the multicolumn layout is quite advanced while Flexbox is still in the early stages (although a substantial portion of the layout tests are passing now).

We discussed about the different strategies that Firefox, Chrome and Safari are following to redesign the layout logic, which has a huge legacy codebase required to support the old web along the last years. However, browsers need to adapt to the new layout models that a modern Web Platform requires. Chrome with LayoutNG implies a clear bet, with a big team and strong determination; it seems it’ll be ready to ship in the first months of 2019. Firefox is also starting to implement a new layout design with Servo, although I couldn’t get details about its current status and plans. Finally, WebKit started a few months ago a new project called Next-Generation layout which tries to implement a new Layout Formatting Context (LFC) logic from scratch, getting rid f the huge technical debt acquired during the last years; although I couldn’t get confirmation, my opinion is that it’s still an experimental project.

We also had time to talk about the effort ARM is doing towards a better parallelization of the CSS parsing and style recalc logic, following a similar approach to Mozilla’s Sylo in Servo. It’s a very intresting initiative, but still quite experimental. There is some progress on specific codepahts, but still dealing with Oilpan (Blink’s Garbage Collector) which is the root cause of several issues that prevents to obtain an effective parallelization.

Hacking, hacking, hacking, ….

As I commented, this event is designed precisely to gather together some of the most brilliant minds in the Web Platform to discuss, analyze and hack on complex topics that are usually very difficult to handle when working remotely. I had a clear hacking task this time, so that’s why I decided to focus a bit more on coding. Although I already had assumed that implementing CSS Grid in LayoutNG would be a huge challenge, I decided to take it and at least start the task. I took as reference the Flexible Box implementation, which is under development right now and something Christian was partially involved on.

As it happened with the Flexbible Box implementation, the first step was to redesign the logic so that we can get rid of the dependency of the old Layout Tree, in this case, the LayoutGrid class. This has ben a complex and long task, which took me a quite big part of my time during the hackfest. The following diagrams show the redesign effort achieved, which I’d admit is still a preliminary approach that needs to be refined:

The next step was to implement an skeleton of the new layout-ng grid algorithm. Thanks to Christian’s direction, I quickly figure out how to do it and it looks like something like this:

namespace blink {
 
NGGridLayoutAlgorithm::NGGridLayoutAlgorithm(NGBlockNode node,
                                             const NGConstraintSpace&amp; space,
                                             NGBreakToken* break_token)
    : NGLayoutAlgorithm(node, space, ToNGBlockBreakToken(break_token)) {}
 
scoped_refptr<nglayoutresult> NGGridLayoutAlgorithm::Layout() {
  return container_builder_.ToBoxFragment();
}
 
base::Optional<minmaxsize> NGGridLayoutAlgorithm::ComputeMinMaxSize(
    const MinMaxSizeInput&amp; input) const {
  // TODO Implement this.
  return base::nullopt;
}
 
}  // namespace blink
</minmaxsize></nglayoutresult>

Finally, I tried to implement the Grid layout’s algorithm, according to the CSS Grid Layout feature’s specification, using the new LayoutNG APIs. This is the more complex tasks, since I still have to learn how sizing and positioning functions are used on the new layout logic, specially how to use the new Fragments and ContainerBuilder concepts.

I submitted a WIP CL so that anybody can take a look, give suggestions or continue with the work. My plan is to devote some time to this challenge, every now and then, but I can’t set specifc goals or schedule for the time being. If anybody wants to speed up this task, perhaps it’d be possible to fund a project, which Igalia would be happy to participate.

Other Web Engines Hackfest stories

I also tried to attend some of the talks give during the hackfest and participate in a few breakout sessions. I’ll give now my impression of some of the ones I liked more.

I enjoyed a lot the one given by Camille Lamy, Colin Blundell and Robert Kroeger (Google) about the Chrome’s Servicification project. The new services design they are implementing is awesome and it will improve for sure Chrome modularity and codebase maintenance.

I participated in the MathML breakout session, which has somehow related to LayoutNG. Igalia launched a crowfunding campaign to implement the MathML specification in Chrome, using the new LayoutNG APIs. We thin that MathML could be a great success case for the new LayoutNG APIs, which has the goal of provide a stable API to implement new and complex layout models. This model will provide flexibility to the web engine, proving an easier way to implement new layout models without depending too much on the Chrome development cycle. In a way, this development model could be similar to a polyfill, but it’s integrated in the browser as native code instead of via external libraries.

Can I use CSS Box Alignment ?

As a member of the Igalia\’s team implementing the CSS Grid Layout feature for Blink and WebKit rendering engines, I\’m very proud of what we\’ve achieved from our collaboration with Bloomberg. I think Grid is a very interesting feature for the Web Platform and we still can\’t see all its potential.

One of my main assignments on this project is to implement the CSS Box Alignment spec for Grid. It\’s obvious that alignment is an important feature for many cases in web development, but I consider it a key for a layout model like the one Grid provides.

We recently announced that the patch implementing the self-baseline alignment landed in Blink. This was the last alignment functionality pending to implement, so now we can consider that the spec is complete for Grid. However, implementing a feature like CSS Box Alignment has an additional complexity in the form of interoperability issues.

Interoperability is always a challenge when implementing any new specification, but I think it\’s specially problematic for a feature like this for several reasons:

The feature applies to several layout models.
The CSS Flexible Box specification already defined some of the CSS properties and values.
Once a new layout model implements the new specification, Flexbox is forced to follow it as well.

I admit that the editors of this new specification document made a huge effort to keep backward compatibility with the Flexbox spec (which caused not so few implementation challenges). However, the current Flexbox implementation of the CSS properties and values that both specs have in common would become a Partial Implementation regarding the new spec.

Recently Florian Rivoal found out that this partial implementation of the CSS Box Alignment feature prevents the use of cascade or @support for providing customized fallbacks for the unimplemented Alignment properties.

What does Partial Implementation actually mean ?

As anybody can imagine, implementing a fancy web feature takes a considerable amount of time. During this period, the feature passes through several phases with different exposure to the end users. It\’s precisely due to the importance of end user\’s feedback that these new web features are shipped under experimental flags. This workflow is specially useful no only for browser devs but for the spec editors as well.

For this reason, the W3C CSS Working Group defines a general policy to manage Partial Implementations, which can be summarized as follows:

So that authors can exploit the forward-compatible parsing rules to assign fallback values, CSS renderers must treat as invalid (and ignore as appropriate) any at-rules, properties, property values, keywords, and other syntactic constructs for which they have no usable level of support. In particular, user agents must not selectively ignore unsupported property values and honor supported values in a single multi-value property declaration: if any value is considered invalid (as unsupported values must be), CSS requires that the entire declaration be ignored.

This policy is added to every spec as part of its Conformance appendix, so it is in the case of the CSS Box Alignment specification document. However, the interpretation of the Partial Implementation policy is far from trivial, specially for a feature like CSS Box Alignment. The most restrictive interpretation would imply the following facts:

Any new CSS property of the new spec should be declared invalid until is supported by all the layout models it applies to.
Any of the already existent CSS properties with new values defined in the new spec should be declared invalid until all these new values are implemented in all the layout models such property applies to.
Browsers shouldn\’t ship (without experimental flags) any CSS property or value until it\’s implemented in all the layout model it applies to.

When we discussed about this at Igalia we applied a less restrictive interpretation, based on the assumption that the spec actually defined several features which could be implemented and shipped independently, obviously avoiding any browsers interoperability issues. As it\’s been always in the nature of the specification, keeping backward compatibility with Flexbox implementations has been a must, since its spec already defines some of the CSS properties now present in the new spec.

The issue filed by Florian was discussed during the Tokyo F2F Apr 19-21 2017 meeting, where it was agreed to add a new section in the CSS Box Alignment spec to clarify how implementors of this feature should manage Partial Implementations:

Since it is expected that support for the features in this module will be deployed in stages corresponding to the various layout models affected, it is hereby clarified that the rules for partial implementations that require treating as invalid any unsupported feature apply to any alignment keyword which is not supported across all layout modules to which it applies for layout models in which the implementation supports the property in general.

The new text added makes the Partial Implementation policy less restrictive and, even it contradicts our interpretation of independent alignment features per layout model, it affects only to models which already implement any of the CSS properties defined in the new spec. In this case, only Flexbox has to be updated to implement the new values defined for its alignment related CSS properties: align-content, justify-content and align-self.

Analysis of the implementation and shipment status

Before thinking on how to address the Partial Implementation issues, I decided to analyze what\’s the status of the CSS Box Alignment feature in the different browsers. If you are interested in the full analysis, it\’s available here. The following table shows the implementation status of the new spec in the Safary, Chrome and Firefox browsers, using a color code like unimplemented, only grid or both (flex and grid):

$\"\"$

If you can try out some examples of these Partial Implementation issues, just try flexbox vs grid cases with some of these alignment values: align-items: center, align-self: left; align-content: start or justify-content: end.

The 3 major browsers analyzed have shipped most, if not all, the CSS Box Alignment spec implemented for CSS Grid Layout (since Chrome 57, Safari 10.1, Firefox 52). Firefox is the browser which implemented and shipped a wider support for CSS Flexible Box.

We can extract the following conclusions:

The 3 browsers analyzed have shipped Partial Implementations of the CSS Box Alignment specification, although Firefox is almost complete.
The 3 browsers have shipped a Grid feature that supports completely the new CSS Box Alignment spec, although Safari still misses the self-baseline values.
The 3 implementations of the new CSS Box Alignment specification are backward compatible with the CSS Flexible Box specification, even though it implements for some properties a lower level of the spec (e.g. self-baseline keywords)

Work in progress

Although we are still evaluating the problem together with the Blink and WebKit communities, at Igalia we are already working on improving the situation. We all agree on the damage to the Web Platform that these Partial Implementation issues are causing, as Florian pointed out initially, so that\’s a good starting point. There are bug reports on both WebKit and Blink and we are already providing patches for some of them.

We are still discussing about the best approach, but our bet would be to request an intent-to-implement-and-ship for a CSS Box Alignment (for flexbox layout) feature. This approach fits naturally in our initial plans of implementing several independent features from the alignment specification. It seems that it\’s what Firefox is doing, which already announced the implementation of CSS Box Alignment (for block layout)

Thanks to Bloomberg for sponsoring this work, as part of the efforts that Igalia has been doing all these years pursuing a better and more open web.

$\"Igalia$

Web Engines Hackfest 2016

Last week I attended the Web Engines Hackfest 2016, hosted by Igalia at the HQ premises in A Coruña. For those still unaware, it’s a unconference like event focused on pure hacking and technical discussions about the main Web Engines supporting the Web Platform.

30147116385_9e2737ef4d_o

This year there were a very interesting group of hackers, representing most of the main web engines, like Mozilla’s Gecko and Servo, Google’s Blink, Apple’s WebKit and Igalia’s WebKitGTK+.

Hacking log

This year I was totally focused on hacking Blink web engine to solve some of the most complex issues of the CSS Grid Layout feature. I’d like to start giving thanks to Google to join the hackfest sending Christian Biesinger, specially thanks to him because of the long flight to attend. It was a pleasure to work work with him on-site and have the opportunity to discuss these complex issues face to face.

Day 1

During the weeks previous to the hackfest I’ve been working on fixing the bug 628565, reported several months ago but hitting me quite much recently. It’s a very ugly issue, since it shows an unpredictable behavior of Grid layout logic, so I decided that I might fix it once for all. I managed to provide 2 different approaches, which can be analyzed in the following code review issues: Issue 2361373002 and Issue 2333583002. Basically, the root cause of both issues is the same; grid’s container intrinsic size is not computed correctly when there are grid items with orthogonal flows.

There are 2 fundamental concepts that are key for understanding this problem:

Intrinsic size computation must be done before layout.
Orthogonal flow boxes need to be laid out in order to compute min-content contribution to their container’s intrinsic width.

So, we had as single bug to fix for solving two different problems: unpredictable grid layout logic and incorrect grid’s intrinsic size computation. The first problem is the one reported in bug 628565. I’ll try to describe it here briefly, but further details can be obtained from the bug report. Let’s consider the following code:

<style>
   body { overflow: hidden; }
</style>
<div style="width: fit-content; display: grid; grid-template-rows: 50px; border: 5px solid; font: 25px/1 Ahem;">
   <div style="writing-mode: vertical-lr; color: magenta; background: cyan;">XX X</div>
</div>

The following pictures illustrate the issue when loading the code above with Google Chrome 55.0.2859.0 (Official Build) dev (64-bit) Revision 3f63c614e8c4501b1bfa3f608e32a9d12618b0a0-refs/heads/master@{#418117} under Linux operating system:

Chrome BEFORE resizing
chrome-before-resizing

Chrome AFTER resizing
chrome-after-resizing

Christian and I analyzed carefully Blink’s layout code and how it deals with orthogonal boxes. It’s worth mentioning that there are important issues with orthogonal flows in almost every web engine, but Blink has made lately some improvements on this regard. See how a very basic example works in Blink, Gecko and WebKit engines:

<div style="float: left; border: 5px solid; font: font: 25px/1 Ahem;"></div>
  <div style="writing-mode: vertical-lr; color: magenta; background: cyan;">XX X</div>
</div>

orthogonal-flow-different-engines

Blink has implemented a kind of pre-layout logic which is executed for any orthogonal flow box even before doing the actual layout, when box’s intrinsic size computation takes place. This solution allows using a more precise info about an orthogonal box’s height when computing its container’s intrinsic width; otherwise zero width would be assumed, as it’s the case of WebKit and Gecko engines. However, this approach leads to an incorrect behavior when using Grid Layout as I’ll explain later.

My first approach to solve these two issues was to update grid container’s intrinsic size after its orthogonal children were laid out. Hence, we could use the actual size of the children to compute their container’s intrinsic size. However, this violates the first rule of the two assertions defined before: no layout should be done for intrinsic size computation.

Layout Breakout Session

During the evening we held the Layout Breakout Session, where we had a nice discussion about the future of Layout in the different web engines. Christian Biesinger, one of the members of the Google’s Blink Layout team, talked about the new LayoutNG project; an experiment to implement a new Layout from scratch, cleaning up quite old code paths and solving some problems that were not possible to address with the current legacy code. This new LayoutNG idea is related to the new Layout API and the Houdini project, a new mechanism for defining new layout models without the requirement of a native support inside the browser. We had also some discussions about the current state of the Flexible Box specification in Blink and how WebKit’s implementation is quite abandoned and unmaintained nowadays.

In addition, we discussed about the current state of CSS Grid Layout implementation in the different engines. The implementation is almost complete in most of the main engines. Thanks to the collaboration between Igalia and Bloomberg we can confirm that WebKit and Blink’s implementations are almost completed. We have been evaluating Mozilla’s Gecko implementation of Grid and we verified it’s in a similar status. We talked about the recent news from TPAC, which Manuel Rego attended, about the CSS Grid Layout specification becoming Candidate Recommendation. Because of all these reasons, we have agreed with Christian that it’d be good to send the Blink intent-to-ship request as soon as possible; in case it’s accepted, it could be enabled by default in the next Chrome release.

Day 2

The day started with a meeting with Christian for evaluating the different approaches we implemented so far. We have discarded the ones requiring updating intrinsic size. We also decided to avoid solving the issue during the pre-layout of orthogonal items. This approach would have been the one with less impact on performance for grid layout and it would also solve the incorrect intrinsic size issue, however it would add penalties for cases not using grid at all.

Finally, Christian and I agreed on solving first the unpredictable behavior of grid layout logic. We would skip the issue of incorrect intrinsic size, overall because we think the Grid Layout specification is contradictory on this regard. For what is worth, I’ve created a new issue for the CSSWG in the W3C’s github. Even though we should wait for the issue to get clarified, we have already some possible approaches for getting what seems a more natural result for grid’s intrinsic size. The following example could help to understand the problem:

intrinsic-sizes-with-orthogonal

Both test cases were loaded using the same Chrome version commented before. They clearly show that neither min-content or max-content sizes are applied correctly to the grid container. The reason of this weird behavior is how content-sized tracks are handled by the Grid Tracks sizing algorithm in case of rendering grid items with orthogonal flow. From the last draft specification:

Then, if the min-content contribution of any grid items have changed based on the row sizes calculated in step 2, steps 1 and 2 are repeated with the new min-content contribution and max-content contribution (once only).

This, with the fact that orthogonal boxes pre-layout is performed before the tracks have been defined, causes that grid’s container intrinsic size is computed incorrectly. This problem is explained in detail in the W3C’s github issue mentioned before. So if anybody is interested on additional details or, even better, willing to participate in the ongoing discussion, just follow the link above.

Day 3

In addition to the orthogonal flow issues, Baseline Alignment was the other hot topic of my work during the hackfest. Baseline Alignment is the only feature still missing to complete the implementation of the CSS Box Alignment specification for Grid Layout. There is a preliminary approach in this code review issue, but it’s still not complete. The problem is that as it’s stated in the Alignment spec, Baseline Alignment may affect grid’s container intrinsic size:

When specified for align-self/justify-self, these values trigger baseline self-alignment, shifting the entire box within its container, which may affect the sizing of its container.

This fact implies that I should integrate Baseline offset computation inside the Grid Tracks sizing algorithm. Christian and I have been analyzing the sizing algorithm and we designed a possible approach, quite similar to what Flexbox implements in its layout logic.

Finally, we met again with Christian to discuss about the Minimum Implied size for Grid issues. Manuel had the chance to discuss it with the Grid spec editors at TPAC, but it seems there are still unresolved issues. There are an ongoing discussion at W3C’s github about this problem, you can get the details in issues 283 and 523. Christian suggested that he could add some feedback to the discussion, so we can clarify it as soon as possible. This is an important issue that may affect browser interoperability.

The hackfest

The Web Engines Hackfest is an event to share experiences between hackers of different web engines and brainstorming about the future of the Web Platform. This year there were hackers representing most of the main web engines, including Google’s Blink, Mozilla’s Gecko and Servo, Apple’s WebKit and Igalia’s WebKitGTK+.

webengines-1

There were scheduled talks from hackers of each engine so everybody could get an idea of their current state and future plans. In addition, some breakout sessions were scheduled during the kick-off session driven by my colleague Martin Robinson. We embraced everybody to propose new breakout sessions on the fly, every time an ongoing discussion needed a deeper debate or analysis.

ctrbqf6wcaatsvk-jpglarge

I could attend most of the talks and breakout sessions, so I’ll give now my impressions about them. All the talks were recorded (will be available soon) and slides are available in the wiki, so I recommend to watch them if you haven’t already.

30112189436_db06b8fd4b_o

The first speaker in the schedule was Jack Moffitt (Mozilla) “Servo: Today & Tomorrow”. He gave a great overview of the progress they made on the new Servo rendering engine, emphasizing the multi-thread support and showing some performance metrics for different engine’s components, CSS parsing, scripting, layout, … He remarked the technical advantages of using Rust on the development of this new engine.

30032830492_aa442a25b0_o

During Day 1 I attended two breakout sessions. The first one about the WebKitGTK+ engine was mainly focused on the recently published roadmap and specially about the new Threaded Compositor enabled by default. There was an interesting discussion about graphics support in WebKitGTK+ between the Collabora developers and Andrey Fedorov (Huawei).

29851562800_e6b17e78bd_o

The other breakout session I could attend during Day 1 was the one about the Servo engine. There was a nice discussion between Mozilla developers and Christian Biesinger about how both engines handle rendering in a different way. Servo hackers explained with more detail Servos’s threading model and its implication for layout.

30112183886_b4d3836246_o

Day 2 started with the awesome talk by Behdad Esfahbod (Google) “HarfBuzz, the First Ten Years”. He gave an overview of the evolution of the HarfBuff library along the last years, including the past experiences in the Web Engines Hackfest and former events under previous name WebKitGTK+ Hackfest. Clearly, rendering fonts is a huge technical challenge. It was also awesome to have Behdad here to work closely with my colleague Fred Wang on the fonts support for MathML.

30112185606_76ecfb49fe_o

I attended the MathML breakout session, where the hot topic was Fred’s prototype implemented for the Blink engine. The result of this experiment was really promising, so we started to discuss its chances of getting back in Chrome, even behind an experimental flag. Both Christian and Behdad expressed doubts about that being possible nowadays. They suggested that it may be feasible to implement it through Houdini and the new Blink’s Layout API. We have confirmed that the refactoring we have done for WebKit applies perfectly in Blink, since both share a substantial portion of the layout codebase. In addition, after our work for removing the Flexbox dependency and further code clean up, we can be confident on the MathML independence in the layout logic. After some discussion, we have agreed on continue our experiments in Blink, independently of its chances to be integrated in Blink in the future, since I don’t think the Houdini approach makes sense for MathML, at least now.

Later on Youenn Fabler (Apple) gave a talk about the Fetch API, “Fetching Bytes and Words on the Web”. He described the implementation of this specification in WebKit and its relation with the Streams API. The development is still ongoing and there are still quite much work to do.

30147118585_6d5aa818fb_o

I missed the talks of Day 3, so hopefully I’ll watch them as soon as the videos are available. It’s specially interesting the talk about WPE (aka WebKit For Wayland) , by my colleague Žan Doberšek. I’ve spent the day trying to get the most of having Christian Biesinger at our premises. As I commented before, we discussed about the Grid Layout’s Implied Minimum Size issue, one of the most complex problems we’ll have to solve in the short term.

Performance analysis of Grid Layout

Now that we have a quite complete implementation of CSS Grid Layout specification it’s time to take care of performance analysis and optimizations. In this essay, which is the first of a series of posts about performance, I’ll first introduce briefly how to use Blink (Chrome) and WebKit (Safari) performance analysis tools, some of the most interesting cases I’ve seen during my work on the implementation of this spec and, finally, a basic case to compare Flexbox and Grid layout models, which I’d like to evolve and analyze further in the coming months.

Performance analysis tools

Both WebKit and Blink projects provide several useful and easy to use scrips (python) to run a set of test cases and take different measurements and early analysis. They were written before the fork, that’s why related documentation can be found at WebKit’s track, but both engines still uses them, for the time being.

Tools/Scripts/run-perf-tests
Tools/Scripts/webkitpy/performance_tests/

There are a wide set of performance tests under PerformanceTest folder, at Blink’s/WebKit’s root directory, but even though both engines share a substantial number of tests, there are some differences.

(blink’s root directory) $ ls PerformanceTests/
Bindings BlinkGC Canvas CSS DOM Dromaeo Events inspector Layout Mutation OWNERS Parser resources ShadowDOM Skipped SunSpider SVG XMLHttpRequest XSSAuditor

Chromium project has introduced a new performance tool, called Telemetry, which in addition of running the above mentioned tests, it’s designed to execute more complex cases like running specific PageSets or doing benchmarking to compare results with a preset recording (WebPageRelay). It’s also possible to send patches to performance try bots, directly from gclient or git (depot_tools) command line. There are quite much information available in the following links:

Regarding profiling tools, it’s possible both in Webkit and Blink to use the –profiler option when running the performance tests so we can collect profiling data. However, while WebKit recommends perf for linux, Google’s Blink engine provides some alternatives.

CSS Grid Layout performance tests and current status

While implementing a new browser feature is not easy to measure performance while code evolves so much and quickly and, what it’s worst, be aware of regressions introduced by new logic. When the feature’s syntax changes or there are missing or incomplete functionality, it’s not always possible to establish a well defined baseline for performance. It’s also a though decision to determine which use cases we might care about; obviously the faster the better, but adding performance optimizations usually complicates code, it may affect its robustness and it could lead to unexpected, and even worst, hard to find bugs.

At the time of this writing, we had 3 basic performance tests:

auto-grid-lots-of-data.html – 100 x 20 auto-sized grid (no stretch).
fixed-grid-lots-of-data.html – 100 x 20 fixed-sized grid (no stretch).
fixed-grid-lots-of-stretched-data.html – 100 x 20 fixed-sized grid (stretch)

Why we have selected those uses cases to measure and keep track of performance regression ? First of all, note that auto-sizing one of the most expensive branches inside the grid track sizing algorithm, so we are really interested on both, improving it and keeping track of regressions on this code path.

body {
    display: grid;
    grid-template-rows: repeat(100, auto);
    grid-template-columns: repeat(20, auto);
}
.gridItem {
    height: 200px;
    width: 200px;
}

On the other hand, fixed-sized is the easiest/fastest path of the algorithm, so besides the importance of avoiding regressions (when possible), it’s also a good case to compare with auto-sized.

body {
    display: grid;
    grid-template-rows: repeat(100, 200px);
    grid-template-columns: repeat(20, 200px);
}
.gridItem {
    height: 200px;
    width: 200px;
}

Finally, a stretching use cases was added because it’s the default alignment value for grid items and the two test cases already described use fixed size items, hence no stretch (even though items fill the whole grid cell area). Given that I implemented CSS Box Alignment support for grid I was conscious of how expensive the stretching logic is, so I considered it an important use case to analyze and optimize as much as possible. Actually, I’ve already introduced several optimizations because the early implementation was quite slow, around 40% slower than using any other basic alignment (start, end, center). We will talk more about this later when we analyze a case to compare Flexbox and Grid performance in layout.

body {
    display: grid;
    grid-template-rows: repeat(100, 200px);
    grid-template-columns: repeat(20, 200px);
}
.gridItem {
    height: auto;
    width: auto;
}

The basic HTML body of these 3 tests is quite simple because we want to analyze performance of very specific parts of the Grid Layout logic, in order to detect regressions in sensible code paths. We’d like to have eventually some real use cases to analyze and create many more performance tests, but chrome performance platform it’s definitively not the place to do so. The following graphs show performance evolution during 2015 for the 3 tests we have defined so far.

Note that yellow trace shows data taken from a reference build, so we can discount temporary glitches on the machine running the performance tests of target build, which are shown in the blue trace; this reference trace is also useful to detect invalid regression alerts.

Why performance is so different for these cases ?

The 3 tests we have for Grid Layout use runs/second values as a way to measure performance; this is the preferred method for both WebKit and Blink engines because we can detect regressions with relatively small tests. It’s possible, though, to do other kind of measurements. Looking at the graphs above we can extract the following data:

auto-sized grid: around 650 runs/sec
fixed-sized grid: around 1400 runs/sec
fixed-sized stretched grid: around 1250 runs/sec

Before analyzing possible causes of performance drop for each case, I’ve defined some additional tests to stress even more these 3 cases, so we can realize how grid size affect to the obtained results. I defined 20 tests for these cases, each one with different grid items; from 10×10 up to 200×200 grids. I run those tests in my own laptop, so let’s take the absolute numbers of each case with a grain of salt; although differences between each of these 3 scenarios should be coherent. The table below shows some numeric results of this experiment.

First of all, recall that these 3 tests produce the same web visualization, consisting of grids with NxN items of 100px each one. The only difference is the grid layout strategy used to produce such result: auto-sizing, fixed-sizing and stretching. So now, focusing on previous table’s data we can evaluate the cost, in terms of layout performance, of using auto-sized tracks for defining the grid (which may be the only solution for certain cases). Performance drop is even growing with the number of grid items, but we can conclude that it’s stabilized around 60%. On the other hand stretching is also slower but, unlike auto-sized, in this case performance drop does not show a high dependency of grid size, more or less constant around 15%.

Impact of auto-sized tracks in layout performance

Basically, the track sizing algorithm can be described in the following 4 steps:

1- Initialize per Grid track variables.
2- Resolve content-based TrackSizingFunctions.
3- Grow all Grid tracks in GridTracks from their baseSize up to their growthLimit value until freeSpace is exhausted.
4- Grow all Grid tracks having a fraction as the MaxTrackSizingFunction.

These steps will be executed twice, first cycle for determining column tracks’s size and another cycle to set row tracks’s size which it may depend on grid’s width. When using just fixed-sized tracks in the very simple case we are testing, the only computation required to determine grid’s size is completing step 1 and determining free available space based on the specified fixed-size values of each track.

// 1. Initialize per Grid track variables.
for (size_t i = 0; i < tracks.size(); ++i) {
    GridTrack& track = tracks[i];
    GridTrackSize trackSize = gridTrackSize(direction, i);
    const GridLength& minTrackBreadth = trackSize.minTrackBreadth();
    const GridLength& maxTrackBreadth = trackSize.maxTrackBreadth();
 
    track.setBaseSize(computeUsedBreadthOfMinLength(direction, minTrackBreadth));
    track.setGrowthLimit(computeUsedBreadthOfMaxLength(direction, maxTrackBreadth, track.baseSize()));
 
    if (trackSize.isContentSized())
        sizingData.contentSizedTracksIndex.append(i);
    if (trackSize.maxTrackBreadth().isFlex())
        flexibleSizedTracksIndex.append(i);
}
for (const auto& track: tracks) {
    freeSpace -= track.baseSize();
}

Focusing now on the auto-sized scenario, we will have the overhead of resolving content-sized functions for all the grid items.

// 2. Resolve content-based TrackSizingFunctions.
if (!sizingData.contentSizedTracksIndex.isEmpty())
    resolveContentBasedTrackSizingFunctions(direction, sizingData);

I didn’t add source code of resolveContentBasedTrackSizingFunctions because it’s quite complex, but basically it implies a cost proportional to the number of grid tracks (minimum of 2x), in order to determine minContent and maxContent values for each grid item. It might imply additional computation overhead when using spanning items; it would require to sort them based on their spanning value and iterate over them again to resolve their content-sized functions.

Some issues may be interesting to analyze in the future:

How much each content-sized track costs ?
What is the impact on performance of using flexible-sized tracks ? Would it be the worst case scenario ? Considering it will require to follow the four steps of track sizing algorithm, it likely will.
Which are the performance implications of using spanning items ?

Why stretching is so performance drain ?

This is an interesting issue, given that stretch is the default value for both Grid and Flexbox items. Actually, it’s the root cause of why Grid beats Flexbox in terms of layout performance for the cases when stretch alignment is used. As I’ll explain later, Flexbox doesn’t have the optimizations I’ve implemented for Grid Layout.

Stretching logic takes place during the grid container layout operations, after all tracks have their size precisely determined and we have properly computed all grid track’s positions relatively to the grid container. It happens before the alignment logic is executed because stretching may imply changing some grid item’s size, hence they will be marked for layout (if they wasn’t already).

Obviously, stretching only takes place when the corresponding Self Alignment properties (align-self, justify-self) have either auto or stretch as value, but there are other conditions that must be fulfilled to trigger this operation:

box’s computed width/height (as appropriate to the axis) is auto.
neither of its margins (in the appropriate axis) are auto
still respecting the constraints imposed by min-height/min-width/max-height/max-width

In that scenario, stretching logic implies the following operations:

LayoutUnit stretchedLogicalHeight = availableAlignmentSpaceForChildBeforeStretching(gridAreaBreadthForChild, child);
LayoutUnit desiredLogicalHeight = child.constrainLogicalHeightByMinMax(stretchedLogicalHeight, -1);
 
bool childNeedsRelayout = desiredLogicalHeight != child.logicalHeight();
if (childNeedsRelayout || !child.hasOverrideLogicalContentHeight())
    child.setOverrideLogicalContentHeight(desiredLogicalHeight - child.borderAndPaddingLogicalHeight());
if (childNeedsRelayout) {
    child.setLogicalHeight(0);
    child.setNeedsLayout();
}
 
LayoutUnit LayoutGrid::availableAlignmentSpaceForChildBeforeStretching(LayoutUnit gridAreaBreadthForChild, const LayoutBox& child) const
{
    LayoutUnit childMarginLogicalHeight = marginLogicalHeightForChild(child);
 
    // Because we want to avoid multiple layouts, stretching logic might be performed before
    // children are laid out, so we can't use the child cached values. Hence, we need to
    // compute margins in order to determine the available height before stretching.
    if (childMarginLogicalHeight == 0)
        childMarginLogicalHeight = computeMarginLogicalHeightForChild(child);
 
    return gridAreaBreadthForChild - childMarginLogicalHeight;
}

In addition to the extra layout required for changing grid item’s size, computing the available space for stretching adds an additional overhead, overall if we have to compute grid item’s margins because some layout operations are still incomplete.

Given that grid container relies on generic block’s layout operations to determine the stretched width, this specific logic is only executed for determining the stretched height. Hence performance drop is alleviated, compared with the auto-sized tracks scenario.

Grid VS Flexbox layout performance

One of the main goals of CSS Grid Layout specification is to complement Flexbox layout model for 2 dimensions. It’s expectable that creating grid designs with Flexbox will be more inefficient than using a layout model specifically designed for these cases, not only regarding CSS syntax, but also regarding layout performance.

However, I think it’s interesting to measure Grid Layout performance in 1-dimensional cases, usually managed using Flexbox, so we can have comparable scenarios to evaluate both models. In this post I’ll start with such cases, using a very simple one in this occasion. I’d like to get more complex examples in future posts, the ones more usual in Flexbox based designs.

So, let’s consider the following simple test case:

<div class="className">
   <div class="i1">Item 1</div> 
   <div class="i2">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</div>
   <div class="i3">Item 3 longer</div>
</div>

I evaluated the simple HTML example above with both Flexbox and Grid layouts to measure performance. I used a CPU profiler to figure out where the bottlenecks are for each model, trying to explain where differences came from. So, I defined 2 CSS classes for each layout model, as follows:

.flex {
    background-color: silver;
    display: flex;
    height: 100px;
    align-items: start;
}
.grid {
    background-color: silver;
    display: grid;
    grid-template-columns: 100px 1fr auto;
    grid-template-rows: 100px;
    align-items: start;
    justify-items: start;
}
.i1 { 
    background-color: cyan;
    flex-basis: 100px; 
}
.i2 { 
    background-color: magenta;
    flex: 1; 
}
.i3 { 
    background-color: yellow; 
}

Given that there is not concept of row in Flexbox, I evaluated performance of 100 up to 2000 grid or flex containers, creating 20 tests to be run inside the chrome performance framework, described at the beginning of this post. You can check out resources and a script to generate them at our github examples repo.

When comparing both layout models targeting layout times, we see clearly that Grid Layout beats Flexbox using the default values for CSS properties controlling layout itself and alignment, which is stretch for these containers. As it was explained before, the stretching logic adds an important computation overhead, which as we can see now in the numeric table above, has more weight for Flexbox than Grid.

Looking at the plot about differences in layout time, we see that for the default case, Grid performance improvement is stabilized around 7%. However, when we avoid the stretching logic, for instance by using any other alignment value, layout performance it’s considerable worse than Flexbox, for this test case, around 15% slower. This is something sensible, as this test case is the idea for Flexbox, while a bit artificial for Grid; using a single Grid with N rows improves performance considerably, getting much better numbers than Flexbox, but we will see these cases in future analysis.

Grid layout better results for the default case (stretch) are explained because I implemented several optimizations for Grid. Probably Flexbox should do the same, as it’s the default value and it could affect many sites using this layout model in their designs.

Thanks to Bloomberg for sponsoring this work, as part of the efforts that Igalia has been doing all these years pursuing a better and more open web.

Igalia & Bloomberg logos

Distributing tracks along Grid Layout container

In my last post I introduced the concept of Content Distribution alignment and how it affects Grid Layout implementation. At that time, it was possible to use all the <content-position> values to select grid tracks position inside a grid container, moving them across the available space. However, it wasn’t until recently that users can distribute grid tracks along such available space, literally adding gaps in between or even stretching them.

In this post I’ll describe how each <content-distribution> value affect tracks in a Grid Layout, their position and size, using different grid structures (eg. number of tracks, span).

Let’s start analyzing the new Content Distribution alignment syntax defined in the CSS Box Alignment specification:

auto | <baseline-position> | <content-distribution> || [ <overflow-position>? && <content-position> ]

In case of a <content-distribution> value can’t be applied, its associated fallback <content-distribution> value should be used instead. However, this CSS syntax allow users to specify a preferred fallback value:

If both a <content-distribution> and <content-position> are given, the <content-position> provides an explicit fallback alignment.

Before going into each value, I think it’s a good idea to refresh the concepts of alignment container and alignment subject and how they apply in the context of Grid Layout:

The alignment container is the grid container’s content box. The alignment subjects are the grid tracks.

The different <content-distribution> values that can be used for align-content and justify-content CSS properties are defined as follows:

space-between – The alignment subjects are evenly distributed in the alignment container. Default fallback: start.
space-around – The alignment subjects are evenly distributed in the alignment container, with a half-size space on either end. Default fallback: center.
space-evenly – The alignment subjects are evenly distributed in the alignment container, with a full-size space on either end. Default fallback: center.
stretch – Any auto-sized alignment subjects have their size increased equally (not proportionally) so that the combined size exactly fills the alignment container. Default fallback: start.

Picture below describes how these values would behave depending on the number of grid tracks; for simplicity I only use justify-content property, so tracks are distributed along the inline (row) axis. In next examples we will see how both properties work together using more complex grid definitions.

content-distribution-1 — Effect of different Content Distribution values on Grid Layout. **Click** on the Image to evaluate the behavior when using different number of tracks.

Previous examples were defined with grid items filling grid areas of just 1×1 tracks, which makes distribution pretty simple and easier to predict. But thanks to the flexibility of Grid Layout syntax we can define irregular grids, for instance, using the grid-template-areas property like in the next example.

align-content-and span-4 — Basic example of how to apply the different values and its effect on irregular grid design.

Since Content Distribution alignment considers grid tracks as the alignment subject, distributing tracks along the available space may have the consequence of modifying the dimensions of grid-areas defined by more than one track. The following picture shows the result of the code above and provides and excellent example of how powerful is the Content Alignment effect on a Grid Layout.

These use cases can be obtained from Igalia’s Grid Layout examples repository, so anybody can play with different grid designs and alignment values combinations. They are also available at our codepen repository.

Grid Layout behind the scene

Now I’d like to explain a bit what I had to implement in the browser’s webcore to get these new features done; just some small pieces of source code, the ones I considered better, to get an idea of what implementing new behavior in browsers implies.

As you might already know because of my previous posts, CSS Box Alignment specification was born to generalize Flexbox’s alignment behavior so that it can be used for grid and even regular blocks. Several new properties were added, like justify-items and justify-self, and CSS syntax has changed considerably. Specially noteworthy how Content Distribution alignment properties have changed from their initial Flexbox definition. They now support complex values like ‘space-between true’, ‘space-around start’, or even ‘stretch center safe’. This makes possible to express more info than using the previous simple keyword form, although it requires a new CSS parsing logic in Browsers.

More complex CSS parsing

Since both align-content and justify-content properties accept multiple optional keywords I needed to re-implement completely their parsing logic. I’m happy to announce that it recently landed WebKit’s trunk too, so now both web engines support the new CSS syntax.

Due to the complex values defined for theses CSS properties, a new CSSValue derived class was defined to hold all the Content Alignment data, named as CSSContentDistributionValue. This data is then converted to something meaningful for the style logic using the StyleBuilderConverter class. This is the preferred method in both WebKit and Blink engines, which it just needs to be declared in the CSSPropertyNames.in and CSSProperties.in template files, respectively.

align-content initial=initialContentAlignment, converter=convertContentAlignmentData
justify-content initial=initialContentAlignment, converter=convertContentAlignmentData

The StyleBuildConverter logic is pretty simple thanks to these 2 new data structures, as it can be appreciated in the following excerpt of source code:

StyleContentAlignmentData StyleBuilderConverter::convertContentAlignmentData(StyleResolverState&amp;, CSSValue* value)
{
    StyleContentAlignmentData alignmentData = ComputedStyle::initialContentAlignment();
    CSSContentDistributionValue* contentValue = toCSSContentDistributionValue(value);
    if (contentValue->distribution()->getValueID() != CSSValueInvalid)
        alignmentData.setDistribution(*contentValue->;distribution());
    if (contentValue->position()->getValueID() != CSSValueInvalid)
        alignmentData.setPosition(*contentValue->;position());
    if (contentValue->overflow()->getValueID() != CSSValueInvalid)
        alignmentData.setOverflow(*contentValue->overflow());
    return alignmentData;
}

The StyleContentAlignmentData class was defined to simplify how we manage these complex values, so that we can handle properties as they had an atomic value. This approach allows a more efficient and robust way of detecting and managing style changes in these properties.

New Layout operations

Once this new CSS syntax is correctly parsed and a LayoutStyle instance generated according to user defined CSS style rules, I needed to modify Flexbox’s layout code for adapting it to the new data structures, ensuring browser backward compatibility and passing all the Layout and Unit tests. I implemented from scratch this logic for Grid Layout so I had the opportunity to introduce several performance optimizations to avoid unnecessary layouts and repaints. This area is pretty interesting and I’ll talk about it soon in a new post.

One interesting aspect of Content Distribution alignment is that it might take part in the track sizing algorithm. As it was explained in my previous post about Self Alignment, stretch value increases alignment subject’s size to fill its alignment container’s available space. This is also the case of Content Alignment, but considering tracks as alignment subject. However, there is another case not so obvious where <content-distribution> values may influence in track sizing resolution, or perhaps better said, grid area sizing.

Let’s consider this example of grid where there are certain areas using more than one track:

grid-template-areas: "a a b"
                     "c d b"
grid-auto-columns: 20px;
grid-auto-rows: 40px;
width: 150px;
height: 300px;

The example above defines a grid with 3 column tracks of 20px and 2 row tracks of 40px, which would be laid out as it’s shown in the following diagram:

content-distribution-spans — Grid Layout with areas filing more than one track. **Click** on the picture to evaluate the effect of each value on the grid area size.

This fact has interesting implementation implications due to the fact that in certain cases, in order to determine grid item’s logical height we need its logical width to be resolved first. Track sizing algorithm uses children grid area size to determine grid cell’s logical height, hence given that alignment logic needs track sizes have been already resolved, it may imply a re-layout of the grid items which size could be affected by the used content-distribution value. The following source code shows how I handle this scenario:

LayoutUnit LayoutGrid::gridAreaBreadthForChild(const LayoutBox& child, GridTrackSizingDirection direction, const Vector& tracks) const
{
    const GridCoordinate& coordinate = cachedGridCoordinate(child);
    const GridSpan& span = (direction == ForColumns) ? coordinate.columns : coordinate.rows;
    const Vector& trackPositions = (direction == ForColumns) ? m_columnPositions : m_rowPositions;
    if (span.resolvedFinalPosition.toInt() < trackPositions.size()) {
        LayoutUnit startOftrack = trackPositions[span.resolvedInitialPosition.toInt()];
        LayoutUnit endOfTrack = trackPositions[span.resolvedFinalPosition.toInt()];
        return endOfTrack - startOftrack + tracks[span.resolvedFinalPosition.toInt()].baseSize();
    }
    LayoutUnit gridAreaBreadth = 0;
    for (GridSpan::iterator trackPosition = span.begin(); trackPosition != span.end(); ++trackPosition)
        gridAreaBreadth += tracks[trackPosition.toInt()].baseSize();
 
    return gridAreaBreadth;

The code above will return different results, in the cases mentioned before, depending on whether it’s run during track sizing alignment or after applying the alignment logic. This will likely make needed a new layout of the whole grid, or at least, the affected grid items, which it likely has a negative impact on performance.

Current status and next steps

I’d like to finish this post with a snapshot of current situation and challenges for the next months, as I’ve been regularly doing in my last posts.

Unlike last reports, this time I’ve got good news regarding reduction of implementation gaps between the two web engines we are focusing our efforts on, WebKit and Blink. The following table describes current situation:

The table above indicates that several milestones were reached since the last report, although there are still some pending issues:

I’ve completed the implementation in WebKit of the parsing logic for the new Box Alignment properties: align-items and align-self.
As a side effect, I’ve also upgraded the ones already present because of Flexbox to the latest CSS3 Box Alignment specification.
WebKit has now full support for Default and Self Alignment fro Grid Layout, including also overflow handling
Blink has now full support for Content Distribution alignment, which missed <content-distrbuton> values.
WebKit’s Grid Layout implementation still misses support for Content Distribution alignment.
Baseline Alignment is still missing in both web engines

In addition to the above mentioned pending issues, our roadmap include the following tasks as part of my todo list for the next months:

Even though there s support for different writing-modes and flow directions, there are still some issues with orthogonal flows. I’ve got already some promising patches but they still have to be reviewed by Blink and WebKit engineers.
Optimizations of style and repaint invalidations triggered by changes on the alignment properties. As commented before, this is a very interesting topic involving, which I’ll elaborate further in next posts.
Performance analysis of relevant Grid Layout use cases, which hopefully will lead to optimizations proposals.

All this work and many other contributions to Grid Layout for WebKit and Blink web engines are the result of the collaboration between Bloomberg and Igalia to implement this W3C specification.

Igalia & Bloomberg logos