MessageFormat 2.0: A domain-specific language?
I recommend reading the prequel to this post, "MessageFormat 2.0: a new standard for translatable messages"; otherwise what follows won't make much sense!
What we've achieved #
In the previous post, I've just described the work done by the members of the MF2 Working Group
to develop a specification for MF2, a process that spanned several years.
When I joined the project in 2023, my task was to implement the spec
and produce a back-end for the JavaScript Intl.MessageFormat
proposal.
|-------------------------------|
| JavaScript code using the API |
|-------------------------------|
| browser engine |
|-------------------------------|
| MF2 in ICU |
|-------------------------------|
For code running in a browser, the stack looks like this: web developers write JavaScript code that uses the API; their code runs in a JavaScript engine that's built into a Web browser; and that browser makes API calls to the ICU library in order to execute the JS code.
The bottom layer of the stack is what I've implemented as a tech preview. Since the major JavaScript engines are implemented in C++, I worked on ICU4C, the C++ version of ICU.
The middle level is not yet implemented. A polyfill makes it possible to try out the API in JavaScript now.
Understanding MF2 #
To understand the remainder of this post, it's useful to have read over a few more examples from the MF2 documentation. See:
- The spec
- Video: MessageFormat 2 open house talk by Addison Phillips and Elango Cheran, February 2024
- Video: Unicode Technology Workshop talk by Addison Phillips, November 2023
Is MF2 a programming language? (And does it matter?) #
Different members of the working group have expressed different opinions on whether MF2 is a programming language:
- Addison Phillips: "I think we made a choice (we can reconsider it if necessary, although I don't think we need to) that MF2 is not a resource format. I'm not sure if 'templating language' is the right characterization, but let's go with it for now." (comment on pull request 474, September 2023)
- Stanisław Małolepszy: "...we’re neither resource format nor templating language. I tend to think we’re a storage format for variants." (comment on the same PR)
- Mihai Niță: "For me, this is a DSL designed for i18n / l10n." (comment on PR 507)
MF2 lacks block structure and complex control flow: the .match
construct
can't be nested, and .local
variable declarations are sequential rather than nested.
Some people might expect a general-programming language
to include these features.
But in implementing it, I have found that several issues arise that are familiar from my experience studying, and implementing, compilers and interpreters for more conventional programming languages.
I wrote in part 1 that when using either MF1 or MF2, messages can be separated from code -- but it turns out that maybe, messages are code! But they constitute code in a different programming language, which can be cleanly separated from the host programming language, just as with uninterpreted strings.
Abstract syntax #
As with a programming language, the syntax of MF2 is specified using a formal grammar, in this case using Augmented Backus-Naur Form. In most compilers and interpreters, a parser (that's either generated from a specification of the formal grammar, or written by hand to closely follow that grammar) reads the text of the program and generates an abstract syntax tree (AST), which is an internal representation of the program.
MF2 has a data model, which is like an AST. Users can either write messages in the text-based syntax (probably easiest for most people) or construct a data model directly using part of the API (useful for things like building external tools).
Parsers and ASTs are familiar concepts for programming language implementors, and writing the "front-end" for MF2 (the parser, and classes that define the data model) was not that different from writing a front-end for a programming language. Because it works on ASTs, the formatter itself (the part that does all the work once everything is parsed) also looks a lot like an interpreter. The "back-end", producing a formatted result, is currently quite simple since the end result is a flat string -- however, in the future, "formatting to parts" (necessary to easily support features like markup) will be supported.
For more details, see Eemeli Aro's FOSDEM 2024 talk on the data model.
Naming #
MF2 has local variable declarations, like the let
construct in JavaScript
and in some functional programming languages. That means that in implementing MF2,
we've had to think about evaluation strategies (lazy versus eager evaluation),
scope, and mutability. MF2 settled on a spec in which all local variables
are immutable and no name shadowing is permitted except in a very limited way,
with .input
declarations. Free variables are permitted, and don't need to be
declared explicitly: these correspond to runtime arguments provided to a message.
Eager and lazy implementations can both conform to the spec, since variables are
immutable, and lazy implementations are free to use call-by-need semantics (memoization).
These design choices weren't obvious, and in some cases, inspired quite a lot of discussion.
Naming can have subtle consequences.
In at least one case, the presence of .local
declarations necessitates
a dataflow analysis (albeit a simple one).
The analysis is to check for
"missing selector annotation" errors;
the details are outside the scope of this post, but the point is just
that error checking requires "looking through" a variable reference,
possibly multiple chains of references, to the variable's definition.
Sounds like a programming language!
Functions #
Placeholders in MF2 may be expressions with annotations. These expressions resemble function calls in a more conventional language. Unlike in most languages, functions cannot be implemented in MF2 itself, but must be implemented in a host language. For example, in the ICU4C implementation, the functions would either be written in C++ or (less likely) another language that has a foreign function interface with C++.
Functions can be built-in (required in the spec to be provided), or custom.
Programmers using an implementation of the MF2 API can write their own
custom functions that format data ("formatters"), that customize the behavior
of the .match
construct ("selectors"), or do both.
Formatters and selectors have different interfaces.
The term "annotation" in the MF2 spec is suggestive.
For example, a placeholder like {$x :number}
can be read:
"x should be formatted as a number."
But custom formatters and selectors may also do much more
complicated things, as long as they conform to their interfaces.
An implementor could define a custom :squareroot
function,
for example, such that {$x :squareroot}
is replaced
with a formatted number whose value is the square root
of the runtime value of $x
.
It's unclear how common use cases like this will be,
but the generality of the custom function mechanism
makes them possible.
Wherever there are functions, functional programmers think about types, explicit or implicit. MF2 is designed to use a single runtime type: it's unityped. The spec uses the term "resolved value" to refer to the type of its runtime values. Moreover, the spec tries to avoid constraining the structure of "resolved values", to allow the implementation as much freedom as possible. However, the implementation has to call functions written in a host language that may have a different type system. The challenge comes in defining the "function registry", which is the part of the MF2 spec that defines both the names and behavior of built-in functions, and how to specify custom functions.
Different host languages have different type systems. Specifying the interface between MF2 and externally-defined functions is tricky, since the goal is to be able to implement MF2 in any programming language.
While a language that can call foreign functions but not define its own functions is unusual, defining foreign function interfaces that bridge the gaps between disparate programming languages is a common task when designing a language. This also sounds like a programming language.
Pattern matching #
The .match
construct in MF2 looks a lot like a case
expression in Haskell,
or perhaps a switch
statement in C/C++, depending on your perspective.
Unlike switch
, .match
allows multiple expressions to be examined in order
to choose an alternative. And unlike case
, .match
only compares data against
specific string literals or a wildcard symbol, rather than destructuring the expressions
being scrutinized.
The ability provided by MF2 to customize pattern-matching is unusual. An implementation of a selector function takes a list of keys, one per variant, and returns a sorted list of keys in order of preference. The list is then used by the pattern matching algorithm in the formatter to pick the best-matching variant given that there are multiple selectors. An abstract example is:
.match {$x :X} {$y :Y}
A1 A2 {{1}}
B1 B2 {{2}}
C1 * {{3}}
* C2 {{4}}
* * {{5}}
In this example, the implementation of X
would be called with
the list of keys [A1, B1, C1]
(the *
key is not passed
to the selector implementation) and returns a list of the
same keys, arranged in any order. Likewise, the implementation
of Y
would be called with [A2, B2, C2]
.
I don't know of any existing programming languages with a pattern-matching construct like this one; usually, the comparison between values and patterns is based on structural equality and can't be abstracted over. But the ability to factor out the workings of pattern matching and swap in a new kind of matching (by defining a custom selector function) is a kind of abstraction that would be found in a general-purpose programming language. Of course, the goal here is not to match patterns in general but to select a grammatically correct variant depending on data that flows in at runtime.
But is it Turing-complete? #
MF2 has no explicit looping constructs or recursion, but it does have function calls.
The details of how custom functions are realized are implementation-specific;
typically, using a general-purpose programming language.
That means that MF2 can invoke code that does arbitrary computation,
but not express it.
I think it would be fair to say that the combination of MF2
and a suitable registry of custom functions is Turing-complete, but MF2 itself
is not Turing-complete. For example, imagine a custom function named eval
that accepts an arbitrary JavaScript program as a string,
and returns its output as a string.
This is not how MF2 is intended to be used; the spec
notes that "execution time SHOULD be limited" for invocations of custom functions.
I'm not aware of any other languages whose Turing-completeness depends
on their execution environment.
(Though there is at least one lengthy discussion
of whether CSS is Turing-complete.)
If custom functions were removed altogether and the registry of functions was limited to a small built-in set, MF2 would look much less like a programming language; its underlying "instruction set" would be much more limited.
Code versus data #
There is an old Saturday Night Live routine: “It’s a floor wax! It’s a dessert topping! It’s both!” XML is similar. “It’s a database! It’s a document! It’s both!” -- Philip Wadler, "XQuery: a typed functional language for querying XML" (2002)
The line between code and data isn't always clear, and the MF2 data model can be seen as a representation of the input data, rather than as an AST representing a program. Likewise, the syntax can be seen as a serialized format for representing the data, rather than as the syntax of a program.
There is also a "floor wax / dessert topping" dichotomy when it comes to functions. Is MF2 an active agent that calls functions, or does it define data passed to an API, whose implementation is what calls functions?
"Language" has multiple meanings in software: it can refer to a programming language, but the "L" in "HTML" and "XML" stands for "language". We would usually say that HTML and XML are markup languages, not programming languages, but even that point is debatable. After all, HTML embeds JavaScript code; the relationship between HTML and JavaScript resembles the relationship between MF2 and function implementations. Languages differ in how much computation they can directly express, but there are common aspects that unite different languages, like having a formal grammar.
I view MF2 as a domain-specific language for formatting messages, but another perspective is that it's a representation of data passed to an API. An API itself can be viewed as a domain-specific language: it provides verbs (functions or methods that can be called) and nouns (data structures that the functions can accept.)
Summing up #
As someone with a background in programming language design and implementation, I'm naturally inclined to see everything as a language design problem. A general-purpose programming language is complex, or at least can be used to solve complex problems. In contrast, those who have worked on the MF2 spec over the years have put in a lot of effort to make it as simple and focused on a narrow domain as possible.
One of the ways in which MF2 has veered towards programming language territory is the custom function mechanism, which was added to provide extensibility. The mechanism is general, but if there was a less general solution that still supported the desired range of use cases, these years of effort would have unearthed one. The presence of name binding (with all the complexities that brings up) and an unusual form of pattern matching also suggest to me that it's appropriate to consider MF2 a programming language, and to apply known programming language techniques to its design and implementation. This shows that programming language theory has interesting applications in internationalization, which is a new finding as far as I know.
What's left to do? #
The MessageFormat spec working group welcomes feedback during the tech preview period. User feedback on the MF2 spec and implementations will influence its future design. The current tech preview spec is part of LDML 45; the beta version, to be included in LDML 46, may include improvements suggested by users and implementors.
Igalia plans to continue collaboration to advance the Intl.MessageFormat
proposal in TC39.
Implementation in browser engines will be part of that process.
Acknowledgments #
Thanks to my colleagues Ben Allen, Philip Chimento, Brian Kardell, Eric Meyer, Asumu Takikawa, and Andy Wingo; and MF2 working group members Eemeli Aro, Elango Cheran, Richard Gibson, Mihai Niță, and Addison Phillips for their comments and suggestions on this post and its predecessor. Special thanks to my colleague Ujjwal Sharma, whose work I borrowed from in parts of the first post.