My first time attending Igalia Summit

With employee distributed in over 20 countries globally and most are working remotely, Igalia holds summits twice a year to give the employees opportunities to meet up face-to-face. The summit normally runs in a period of a week with code camps, team building and recreational activities etc.. This year’s summer summit was held between the 15th-19th of June in A Coruña, Galicia.

I joined Igalia in the beginning of 2020. Because of COVID-19 pandemic, this was my first time attending an Igalia Summit. Due to personal time schedule I only managed to stay at A Coruña for three nights. The overall experience was great and I thoroughly enjoyed it!

Getting to know A Coruña

Igalia’s HQ is based in A Coruña in Galicia, Spain. A beautiful white-sanded beach is just a few yards away from the hotel we stayed. I did a barefoot stroll along the beach in one morning. The beach itself was reasonably quiet. There were a few people swimming in the shallow part of the sea. Weather that morning was perfect with warm sunshine and occasional cool gentle breezes.

The smell of sea in the air and people wandering around relaxingly in the evenings, somehow brought a familiar feeling to me and reminded of my home town.

On Wednesday I joined guided visit to A Coruña set in 1906. It was a very interesting walk around the historic part of the city. The tour guide Suso Martín presented an amazing one man show by walking us through the A Coruña‘s history back to the 12th century, historic buildings, religion and romances associated with the city.

On Thursday evening we went for the guided tour of MEGA, Mundo Estrella Galicia. En route we passed the office that Igalia used to locate at in early days. According to some Igalians who worked there before, it was a small office and there were about just over 10 Igalians in those days. Today Igalia has over 100 employees and last year we celebrated our 20th year anniversary in Open Source.

The highlights of the MEGA tour for me were watching the product line working and tasting the beers. We were spoiled by beers and local cheeses.

Meeting other igalians

Since I joined Igalian, all the meetings happened online due to pandemic. I was glad that I could finally meet other Igalians “physically” . During the summit I had chances to chat other Igalians at code camp, during meals and guided tours. It was pleasant. Playing board games together after an evening meal (I’d call it “night meal”) was a great fun. Some Igalians can be very funny and witty. I had quite a few “tearful laughter” moments. It was very enjoyable.

During the team code camp, I had chances to spending more time with my teammates. The technical meetings in both days were very engaging and effective. I was very happy to see Javi Fernández in person. Last year I was involved in CSS grid compatibility work (one of the 5 key areas for the Compat2021 effort that Igalia was involved in). I personally had never touched CSS Grid layout at the beginning of the assignment. The web platform team in Igalia though has very good knowledge and working experiences in this area. My teammates Manuel Rego, Javi Fernández and Sergio Villar were among the pioneer developers in this field. To assure the success of the task, the team formed a safe environment by providing me constant guidance and supports. Specifically, I had Javi’s helps throughout the whole task. Javi has been an amazing technical mentor with great expertise and patience. We had numerous calls for technical discussions. The team also had a couple of debugging sessions with me for some very tricky bugs.

The team meal on Wednesday evening was very nice. Delicious food, great companions and nice fruity beers – what could go wrong? Well, we did walk back to the hotel in the rain…

Thanks

The event was very well organized and productive. I really appreciate the hard work and great effort those igalians put in to make it happen. Just want to say – a big THANK YOU! I’m glad that I managed the trip. Traveling from the UK is not straightforward but I’d say it’s well worthy it.

WPT Python 3 Migration

In 2020, Igalia was involved in the Python 3 migration work for the web-platform-tests (WPT) project with sponsorships from Google. After a year-long effort, in December 2020 the flag for python 3 was switched on in WPT. Now over a year on, I only just manage to write about this migration work.  Better late than never, I hope :).

Why migrate?

Python 2 came to the end of life (EOL) on the 1st of January 2020. It marks the end of bugfix support or even security patches for Python 2 from Python maintainers. Code for the final Python2 release  2.7.18 ( happened in April 2020) was also frozen in January 2020. As a well used cross-browser test suite for the Web-platform stack, the web-platform-tests (WPT) Project uses python in many places, from infrastructure to test scripts. From maintenance and support for active development points of view, It’s imperative for WPT to make its code PY3 compatible sooner than later.

Challenges

Both the dynamic quality of the Python language and the complexity of the WPT present significant challenges to the upgrade.

Language challenges

Python is a dynamically typed language. There are no formal semantics for Python. As its de facto reference implementation, CPython maintains high coding standards but is not written with legibility as its primary focus. This means that code paths in Python can contain illegal semantics that are hard to detect even with non-static analyzers. Python 3 is a new version of Python, but it’s not backwards compatible with code written for Python 2. The nature of the changes between Python 2 and Python 3 are not just syntactical, rather, many of the changes are in the semantics. In particular, string literals are fundamentally different types in Python 2 and Python 3. Along with the change in the nature of the language, library support has also shifted. Many older libraries created for Python 2 are not forward-compatible. A lot of recent developers are creating libraries that can only be used with Python 3. We can run tools such as  caniusepython3 to take in a set of dependencies and then figure out which of them are holding us up from porting to Python 3. The tricky part though, is find and port the new libraries that will work.

Project challenges

WPT is a massive suite of tests (over one million in total), and serves many auxiliary functions. It uses Python in many places including but not limited to:
  • The majority of the infrastructure code. This is the code underlying the major wpt command, such as ‘wpt runner’ etc..
  • WPT file handlers, which test authors can define to run custom code in response to them making a particular request to the WPT server.
  • WebDriver tests, which use pytest structured tests.
  • Linting
  • Interacting with the docker, CI systems
  • Rebasing expectations, …
The complexity of the code base requires us to take a step back and have a good overview of the relations of the components that are involved and make a good plan on porting principles, pathways and methodologies.

The Porting Plan

The WPT community was well aware of the challenges of moving to Python 3 for the project. It set principles, suggested possible approaches and planned timelines before and during the major practical work took place.

Principles

  • The migration work should happen in the background since the project is quite active. 
  • The pathway to Python 3 was to make code dual Python 2 and Python 3 compatible and gradually switch over the runtime to Python 3. 
  • The porting should not reduce test coverage without explicit agreement from test authors.

Approaches

To make the porting tractable, it was decided to start with two very specific goals, each approaching the problem from different angles. One was to get the actual runner utility up running in Python 3, by starting to get a basic ‘wpt run‘ command to execute under Python 3. The other was to target wider test coverage via tests by running all relevant unit tests under Python 3.

TimeLines

For a project of non-trivial size like WPT, flag day transitions from Python 2 to Python 3 were simply not viable at the early stage of the project. Before 2020, there were already a few in-depth discussions and work going on within the community for the migration work. The major work, though, happened in 2020.  As the porting progressed, the timelines had got clearer. A concrete timeline of dropping Python 2 support in WPT was set in September 2020:
  • Py3-first” targeting 2021-01-01 : switch test runs to Python 3 on CI, but keep running unit tests and infrastructure tests in Python 2 and 3.
  • “Py3-only” on 2021-02-01: drop all Python 2 tests from CI, and start accepting Python 3-only changes.
WPT successfully moved to the “Py3-first” stage before the targeted date. The minimum python 3 version supported for this move is 3.6 with main focus on 3.8+. 

Implementations

Porting test runner utility

As we mentioned earlier, one of the starting points was to have the actual runner utility, ‘wpt run’  command to execute under Python 3. This porting was pretty straightforward. We came across some typical python 2 to python 3 migration issues such as
  • absolute imports. Absolute imports have become the default in Python 3 and relative imports should be explicit. For example, “from conftest import product, flatten” in Python 2 needs to be declared as “from .conftest import product, flatten” in Python 3.
  • built-in types comparison. In Python 3 most objects of built-in types compare unequal unless they are the same object. The choice of whether one object is smaller or larger than another one is made arbitrarily but consistently within one execution of a program. In Python 2 in the case of ‘mismatched’ types, the types are listed lexicographical by type name, e.g. a “list” comes after an “int” in alphabetical ordering, so is greater. For example, in Python 2, we have

latest_release = 0
version = [int(item) for item in m.groups()]if version > latest_release:

This is not valid in Python 3. Rather, we need to declare latest_release as latest_release = (0,0,0)
  • API changes. There are some API changes between the two versions. For example, the changes of the optional parameter strict in HTTPConnection(). In Python 2 we have httplib.HTTPConnection(self.host, self.port, strict=True, **conn_kwargs). In Python 3 it has become HTTPConnection(self.host, self.port, **conn_kwargs)
  • order of dict. In Python 2, dict is organized via a hash-table and puts the keys into buckets according to their hash() value. in Python 3.6+, dict retains insertion order. One solution to make code work for both versions is to use the  alternative type OrderedDict instead of the original Dict in Python 3.
  • iteration. Python 3 changes the return values of several basic functions from list to iterator. The main reason for this change is that using iterators usually causes better memory consumption than lists. This change has little impact on common use cases. Furthermore, the iter* counterparts (which return iterators in Python 2) have been removed. To make code work for both version, we can call six library APIs and replace them with six.iter* to avoid memory regression in Python 2. This corresponds to dictionary.iteritems() in Python 2 and dictionary.items() in Python 3. six is a Python 2 and 3 compatibility library. It provides utility functions for smoothing over the differences between the Python versions with the goal of writing Python code that is compatible on both Python versions. We called the six  library APIs at a few places during the dual Python 2/3 compatible stage. These API calls were removed after WPT transferred to python 3 only.
  • Bytes vs. str. In python2, binary is basically an alias of str. In python3 the binary data is different to a string. We had to convert some binary data to string type in order to be compatible for both Python 2 and Python 3. This issue, at the utility script level, presented different challenges from that in the core level we are discussing in the next section. Most cases in the utility script can be resolved by adding prefix to quoted string literals. Quoted string literals can be prefixed with “b” or “u” to get bytes or Unicode, respectively. In another word, prefix a native string with “u” in Python 2 to get a Unicode object while prefix with “b” in Python 3 to get bytes. It is also noted that in Python 3, the “u” prefix does nothing. Likewise, the “b” prefix does nothing in Python 2. In the context of this blog, we are talking about prefixing a native string with “b” to get bytes in Python 3 in most cases. 
There were also a few other issues such as Integer division, use of exceptions and call of print but they were generally very minor and easy to resolve.

Handling string types in core

One of the biggest hurdles in our porting effort was how to overcome the string literals type mismatch between Python 2 and 3 in core, specifically in infrastructure and file handlers. As we discussed earlier, in Python 2, a string literal is a sequence of bytes. In Python 3, a string literal is a sequence of Unicode code points. The rationale behind the change was to move to a Unicode-by-default world. Web Platform Test Server (wptserve) often intends to use byte sequences. To overcome this mismatch hurdle, we need  to either always use byte sequences or always use str[RFC49] has illustrated pros and cons for both approaches. It was decided within the community to go the byte sequence path in order to keep a consistent and semantically correct encoding model. That is to always use byte sequences: str in Python 2 and bytes in Python 3. This had incurred some noticeable changes in WPT core. In wptserve
  • It introduced a pair of ISO-8859-1 encode and decode helper functions. Both of them can accept either binary or text strings, but always return binary/text strings respectively regardless of the Python version.
  • Most public APIs for custom handlers can only accept and return binary with notable exception of the response body.
In python file handlers, it has specified string types for Headers on both requests and responses, Request URL/form parameters and response bodies etc.. After the necessary changes in the core part were done, Robert Ma (@robertma) and Stephen Mcgruer (@smcgruer) from Google created  the porting guidelines. Based on the guideline, we re-examined pretty much every line of the  test scripts in the existing handlers to add prefixes to string literals when necessary. Here we’d like to walk through some examples on porting handler related tests following the guideline and hope to share some tips.

Writing Python 3 compatible tests

According to the guideline, rule of thumb for porting is to make sure all strings are either always text or always bytes; all string literals in handlers should be prefixed with "b" or "u".

Headers of request and response

Header data should always be binary strings for both keys and values. Prefer adding "b" prefixes to encoding/decoding.
  • The Request.headers dictionary-like interface (accessed via […], get, items).
headers = [(b"Content-Type", b"text/html")]
if b"allow_csp_from" in request.GET:
headers.append((b"Allow-CSP-From", request.GET[b"allow_csp_from"]))
  • The Request.headers.get_list method example:
assert isinstance(headers.get_list(b'x-bar')[0], bytes)
  • Response.headers.{get,set,append,update,items} examples:
response.headers.set(b'Access-Control-Allow-Origin', request.headers.get(b"origin"))
response.headers.append(b"Access-Control-Allow-Origin", b"*")

HTTP Basic Authentication

Request.auth.{username,password} are binary strings. For example, response.headers.set(b'Access-Control-Allow-Origin', request.headers.get(b"origin"))
response.headers.append(b"Access-Control-Allow-Origin", b"*")
response.headers.set(b'Content-type', b'text/plain')
content = b""

Cookies

  • Request.cookies (similar to Request.headers; it’s a MultiDict with all APIs of dict plus first, last, get_list). For example,
response.content = request.cookies[b"foo"].value
  • Response.{set,unset,delete}_cookie.
response.set_cookie(b"name", b"value")
response.unset_cookie(b"name")

Request URL/form parameters

  • Both the keys and values of URL/form parameters for the request (accessible via request.GET or request.POST) are all binary strings. Prefer adding “b” prefixes to encoding/decoding.
b"realm" in request.POST
request.GET.first(b"type", None) == b"value"

Response Status Message

  • Response status message is binary string as follows.
response.status = 401
response.headers.set(b'Status', b'401 Authorization required')
response.headers.set(b'WWW-Authenticate', b'Basic realm="test"')

Response body

The data put into the response body can be either text or binary strings, but the two types should never be mixed and string literals must be prefixed. response.writer.write(b"This is a body!")
return u”Hello, 世界!”

Status

WPT successfully moved to the “Py3-first” stage in December 2020. In February 2021 it dropped all Python 2 tests from CI, and started accepting Python 3-only changes.