Sharing SpiderMonkey tests with the world
At Igalia, we believe in building open and interoperable platforms, and we’ve found that for such platforms to succeed, it is essential to combine a clear standard with an extensive test suite. We’ve worked on a number of such test suites, ranging from the official JavaScript conformance test suite test262
and web-platform-tests, which covers the rest of the web platform, to the Khronos Vulkan and OpenGL CTS. Our experience has consistently shown that the existence of a thorough test suite that is easy for implementers to run, helps significantly to get different products to interoperate in the real world.
An important way to maximize the coverage of such a test suite is to encourage implementers to share the tests they are writing as part of their implementation work. In the first place, it is helpful to share new tests – a great success story here is the web-platform-tests project, which has two-way synchronization tooling for the major browsers. This allows developers to write tests directly within their own project, which are then shared more or less automatically. However, for mature platforms, especially when the platform and implementations are older than the centralised test suite, there is often a large backlog of existing tests. We would love to see more of these tests made available.
During 2024, we looked in particular at contributing such backlog tests to test262
. We identified SpiderMonkey’s non262
suite as a potential source of interesting tests for this purpose. This test suite is interesting for several reasons.
In the first place, it is part of the larger jstests
suite, which also contains the upstream test262
tests (hence the name). This meant we did not expect architectural issues to crop up when upstreaming those tests. Second, as a test suite built by JavaScript engine implementers, it seemed likely to contain tests for edge cases and requirements that changed over time. It is not uncommon for several implementers to be caught out by the same issue, so such tests are more likely to find bugs in other engines as well.
During our investigation, we discovered that our friends at Bocoup had a similar idea back in 2017, and they had created a python script to transform parts of the jstests
suite to work within the test262
test suite, which we gratefully reused. However, some issues quickly came to light: it had been written for Python 2, and its unit test had not been enabled in continuous integration, so it needed some work to be of use in 2024. Once that was done, we discovered that the script had been used as part of a mostly manual process to submit specifically curated tests, and it could not cope with the diversity and complexity of the whole test suite without significant up-front work to put the tests into a shape that it could deal with. We suspect that this is part of the reason that their project did not bear all that much fruit in the end, and decided that our approach needed to maximize the number of shared tests for the effort expended.
After getting the script into shape, we set to work on a batch export of the non262
test suite. In order to verify the quality of the exported tests, we ran the tests against both SpiderMonkey and V8 using the upstream tooling. This process revealed several issues—some anticipated, others more unexpected. For example, a large number of tests used helper functions that were only available in SpiderMonkey, which we either needed to provide in test262
or automatically translate to an existing helper function. Other APIs, for example those testing specific details of the garbage collection implementation, could not be reproduced at all.
Besides that, a fair number of tests relied on specifics of the SpiderMonkey implementation that were not guaranteed by the standard – such as testing exact values of the error.message
property or the handling of particular bit patterns in typed arrays. Some tests also turned out not to be testing what they were meant to test or covered a previous version of the specification or a not-yet-finalized proposal. Depending on the situation, we improved the tests and added APIs to make it possible to share them, or we skipped exporting the offending tests.
We also discovered some issues in the test262-harness tool that we used to run the tests upstream, notably around tests using JavaScript modules and the [[IsHTMLDDA]] internal slot (used to specify the web compatibility requirements around document.all
). Also, it turned out that the mechanism to include helper libraries into tests was not fully specified, which led to some combinations of helpers to work in some test runners but not others. We have started the process to clarify the documentation on this point.
As part of this project so far, we landed about 1600 new tests into test262
and filed 10 bugs (some covering failures in multiple tests), of which half have been fixed by the engine maintainers. Several failures also were the result of bugs that had been filed earlier but hadn’t been fixed yet. Also, Mozilla has decided to remove the exported tests from their own repository, and to use the centralised copies instead.
In terms of future work for this particular project, we’re expecting to investigate if we can share some more of the tests that are currently skipped. Separately we’re interested in looking into a fully automated two-way synchronization system; this would significantly ease the cooperation of engine developers and project maintainers on a unified test suite, though the engineering effort would be commensurate. We’ll also continue investigating if we can identify other test suites that can benefit from a similar treatment.
We would like to thank Bloomberg for sponsoring this work and the Mozilla community, in particular Dan Minor, for their indispensable help and code reviews.
- Previous: Igalia's Compilers Team in 2024