{"id":4,"date":"2022-01-21T10:05:18","date_gmt":"2022-01-21T10:05:18","guid":{"rendered":"http:\/\/blogs.igalia.com\/zsun\/?p=4"},"modified":"2023-11-10T15:36:54","modified_gmt":"2023-11-10T15:36:54","slug":"wpt-python-3-migration","status":"publish","type":"post","link":"https:\/\/blogs.igalia.com\/zsun\/2022\/01\/21\/wpt-python-3-migration\/","title":{"rendered":"WPT Python 3 Migration"},"content":{"rendered":"\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/blogs.igalia.com\/zsun\/files\/2022\/01\/images_birds.jpg\" alt=\"\" class=\"wp-image-21\" width=\"618\" height=\"346\" srcset=\"https:\/\/blogs.igalia.com\/zsun\/files\/2022\/01\/images_birds.jpg 600w, https:\/\/blogs.igalia.com\/zsun\/files\/2022\/01\/images_birds-300x168.jpg 300w\" sizes=\"auto, (max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px\" \/><\/figure><\/div>\n\n\n\n\n\nIn 2020, Igalia was involved in the Python 3 migration work for the<a href=\"https:\/\/github.com\/web-platform-tests\/wpt\"> web-platform-tests (WPT)<\/a> project with sponsorships from Google. After a year-long effort, in December 2020 the flag for python 3 was switched on in WPT. Now over a year on, I only just manage to write about this migration work.&nbsp; Better late than never, I hope :).\n\n\n\n\n\n<h3 class=\"wp-block-heading\">Why migrate?<\/h3>\n\n\n\n\n\nPython 2 came to the <a href=\"https:\/\/github.com\/python\/devguide\/pull\/344\">end of life (EOL)<\/a> on the 1st of January 2020. It marks the end of bugfix support or even security patches for Python 2 from Python maintainers. Code for the final Python2 release&nbsp; 2.7.18 ( happened in April 2020) was also frozen in January 2020.\n\n\n\n\n\nAs a well used cross-browser test suite for the Web-platform stack, the<a href=\"https:\/\/github.com\/web-platform-tests\/wpt\"> web-platform-tests (WPT)<\/a> Project uses python in many places, from infrastructure to test scripts. From maintenance and support for active development points of view, It\u2019s<strong> <\/strong>imperative for WPT to make its code PY3 compatible sooner than later.\n\n\n\n\n\n<h3 class=\"wp-block-heading\">Challenges<\/h3>\n\n\n\n\n\nBoth the dynamic quality of the Python language and the complexity of the WPT present significant challenges to the upgrade.\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Language challenges<\/h4>\n\n\n\n\n\nPython is a dynamically typed language. There are no formal semantics for Python. As its de facto reference implementation, CPython maintains high coding standards but is not written with legibility as its primary focus. This means that code paths in Python can contain illegal semantics that are hard to detect even with non-static analyzers. Python 3 is a new version of Python, but it\u2019s not backwards compatible with code written for Python 2. The nature of the changes between Python 2 and Python 3 are not just syntactical, rather, many of the changes are in the semantics. In particular, string literals are fundamentally different types in Python 2 and Python 3.\n\n\n\n\n\nAlong with the change in the nature of the language, library support has also shifted. Many older libraries created for Python 2 are not forward-compatible. A lot of recent developers are creating libraries that can only be used with Python 3. We can run tools such as&nbsp; <a href=\"https:\/\/pypi.org\/project\/caniusepython3\/\">caniusepython3<\/a> to take in a set of dependencies and then figure out which of them are holding us up from porting to Python 3. The tricky part though, is find and port the new libraries that will work.\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Project challenges<\/h4>\n\n\n\n\n\nWPT is a massive suite of tests (over one million in total), and serves many auxiliary functions. It uses Python in many places including but not limited to:\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>The majority of the infrastructure code. This is the code underlying the major wpt command, such as &#8216;<em>wpt runner&#8217; <\/em>etc..<\/li><li>WPT <a href=\"https:\/\/web-platform-tests.org\/writing-tests\/python-handlers\/index.html\">file handlers<\/a>, which test authors can define to run custom code in response to them making a particular request to the WPT server.<\/li><li><a href=\"https:\/\/web-platform-tests.org\/writing-tests\/wdspec.html\">WebDriver tests<\/a>, which use pytest structured tests.<\/li><li>Linting<\/li><li>Interacting with the docker, CI systems<\/li><li>Rebasing expectations, \u2026<\/li><\/ul>\n\n\n\n\n\nThe complexity of the code base requires us to take a step back and have a good overview of the relations of the components that are involved and make a good plan on porting principles, pathways and methodologies.\n\n\n\n\n\n<h3 class=\"wp-block-heading\">The Porting Plan <\/h3>\n\n\n\n\n\nThe WPT community was well aware of the challenges of moving to Python 3 for the project. It set principles, suggested possible approaches and planned timelines before and during the major practical work took place.\n\n\n\n\n\n<h3 class=\"wp-block-heading\">Principles<\/h3>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>The migration work should happen in the background since the project is quite active.&nbsp;<\/li><li>The pathway to Python 3 was to make code dual Python 2 and Python 3 compatible and gradually switch over the runtime to Python 3.&nbsp;<\/li><li>The porting should not reduce test coverage without explicit agreement from test authors.<\/li><\/ul>\n\n\n\n\n\n<h3 class=\"wp-block-heading\">Approaches<\/h3>\n\n\n\n\n\nTo make the porting tractable, it was decided to start with two very specific goals, each approaching the problem from different angles. One was to get the actual runner utility up running in Python 3, by starting to get a basic &#8216;<em>wpt run<\/em>&#8216; command to execute under Python 3. The other was to target wider test coverage via tests by running all relevant unit tests under Python 3.\n\n\n\n\n\n<h3 class=\"wp-block-heading\">TimeLines<\/h3>\n\n\n\n\n\nFor a project of non-trivial size like WPT, flag day transitions from Python 2 to Python 3 were simply not viable at the early stage of the project. Before 2020, there were already a few in-depth discussions and work going on within the community for the migration work. The major work, though, happened in 2020.&nbsp; As the porting progressed, the timelines had got clearer. A concrete <a href=\"https:\/\/github.com\/web-platform-tests\/rfcs\/issues\/62\">timeline of dropping Python 2 support in WPT<\/a> was set in September 2020:\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>&#8220;<a href=\"https:\/\/github.com\/web-platform-tests\/rfcs\/pull\/65\">Py3-first<\/a>&#8221; targeting 2021-01-01 : switch test runs to Python 3 on CI, but keep running unit tests and infrastructure tests in Python 2 and 3.<\/li><li>&#8220;Py3-only&#8221; on 2021-02-01: drop all Python 2 tests from CI, and start accepting Python 3-only changes.<\/li><\/ul>\n\n\n\n\n\nWPT successfully moved to the \u201cPy3-first\u201d stage before the targeted date. The minimum python 3 version supported for this move is 3.6 with main focus on 3.8+.&nbsp;\n\n\n\n\n\n<h3 class=\"wp-block-heading\">Implementations<\/h3>\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Porting test runner utility<\/h4>\n\n\n\n\n\nAs we mentioned earlier, one of the starting points was to have the actual runner utility, <em>&#8216;wpt run&#8217;<\/em>&nbsp; command to execute under Python 3. This porting was pretty straightforward. We came across some typical python 2 to python 3 migration issues such as\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>absolute imports. Absolute imports have become the default in Python 3 and relative imports should be explicit. For example, \u201c<code><em>from conftest import product, flatten<\/em><\/code>\u201d in Python 2 needs to be declared as \u201c<code><em>from .conftest import product, flatten<\/em><\/code>\u201d in Python 3.<\/li><\/ul>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>built-in types comparison. In Python 3 most objects of built-in types compare unequal unless they are the same object. The choice of whether one object is smaller or larger than another one is made arbitrarily but consistently within one execution of a program. In Python 2 in the case of &#8216;mismatched&#8217; types, the types are listed lexicographical by type name, e.g. a &#8220;list&#8221; comes after an &#8220;int&#8221; in alphabetical ordering, so is greater.&nbsp;For example, in Python 2, we have <\/li><\/ul>\n\n\n\n\n\n<p style=\"text-align:left\"><em><code>latest_release = 0<\/code><\/em><code><br><\/code><em><code>version = [int(item) for item in m.groups()]if version &gt; latest_release:<\/code><\/em><\/p>\n\n\n\n\n\nThis is not valid in Python 3. Rather, we need to declare <code><em>latest_release<\/em><\/code> as\n\n\n\n\n\n<em><code>latest_release = (0,0,0)<\/code><\/em>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>API changes. There are some API changes between the two versions. For example, the changes of the optional parameter <em><code>strict<\/code> <\/em>in <em><code>HTTPConnection()<\/code><\/em>. In Python 2 we have <code><em>httplib.HTTPConnection(self.host, self.port, strict=True, **conn_kwargs)<\/em><\/code>. In Python 3 it has become <em><code>HTTPConnection(self.host, self.port, **conn_kwargs)<\/code><\/em><\/li><\/ul>\n\n\n\n\n\n\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>order of <code><em>dict<\/em><\/code>. In Python 2, <code><em>dict<\/em><\/code> is organized via a hash-table and puts the keys into buckets according to their <code><em>hash()<\/em><\/code> value. in Python 3.6+, <code><em>dict<\/em><\/code> retains insertion order. One solution to make code work for both versions is to use the&nbsp; alternative type <em><code>OrderedDict<\/code><\/em> instead of the original <em>Dict<\/em> in Python 3.<\/li><li>iteration. Python 3 changes the return values of several basic functions from <em>list<\/em> to <em>iterator<\/em>. The main reason for this change is that using iterators usually causes better memory consumption than lists. This change has little impact on common use cases. Furthermore, the <em><code>iter*<\/code><\/em> counterparts (which return iterators in Python 2) have been removed. To make code work for both version, we can call <a href=\"https:\/\/pypi.org\/project\/six\/\"><em>six<\/em><\/a> library APIs and replace them with <a href=\"https:\/\/docs.google.com\/document\/d\/1_wO8OJ7E1geTiEDyO2ttsuSRCmHDm3bazwDmT9jw04U\/edit#heading=h.1y7y55netxan\"><em><code>six<\/code><\/em><\/a><em><code>.iter*<\/code><\/em> to avoid memory regression in Python 2. This corresponds to <em><code>dictionary.iteritems()<\/code><\/em> in Python 2 and<em> <code>dictionary.items()<\/code><\/em> in Python 3. <a href=\"https:\/\/pypi.org\/project\/six\/\"><em>six<\/em><\/a> is a Python 2 and 3 compatibility library. It provides utility functions for smoothing over the differences between the Python versions with the goal of writing Python code that is compatible on both Python versions. We called the <a href=\"https:\/\/pypi.org\/project\/six\/\"><em>six<\/em><\/a>&nbsp; library APIs at a few places during the dual Python 2\/3 compatible stage. These API calls were removed after WPT transferred to python 3 only.<\/li><li><em><code>Bytes<\/code> <\/em>vs.<em> <code>str<\/code>. <\/em>In python2, binary is basically an alias of <code>str<\/code>. In python3 the binary data is different to a string. We had to convert some binary data to string type in order to be compatible for both Python 2 and Python 3. This issue, at the utility script level, presented different challenges from that in the core level we are discussing in the next section. Most cases in the utility script can be resolved by adding prefix to quoted string literals. Quoted string literals can be prefixed with <code><em>\u201cb\u201d<\/em><\/code> or <code><em>\u201cu\u201d<\/em><\/code> to get bytes or Unicode, respectively. In another word, prefix a native string with <code><em>\u201cu\u201d<\/em><\/code> in Python 2 to get a Unicode object while prefix with <code><em>\u201cb\u201d<\/em><\/code> in Python 3 to get bytes. It is also noted that in Python 3, the <code><em>\u201cu\u201d<\/em><\/code> prefix does nothing. Likewise, the <code><em>\u201cb\u201d<\/em><\/code> prefix does nothing in Python 2. In the context of this blog, we are talking about prefixing a native string with <code><em>\u201cb\u201d<\/em><\/code> to get bytes in Python 3 in most cases.&nbsp;<\/li><\/ul>\n\n\n\n\n\nThere were also a few other issues such as Integer division, use of exceptions and call of print but they were generally very minor and easy to resolve.\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Handling string types in core<\/h4>\n\n\n\n\n\nOne of the biggest hurdles in our porting effort was how to overcome the string literals type mismatch between Python 2 and 3 in core, specifically in infrastructure and file handlers. As we discussed earlier, in Python 2, a string literal is a sequence of bytes. In Python 3, a string literal is a sequence of Unicode code points. The rationale behind the change was to move to a Unicode-by-default world.\n\n\n\n\n\n<a href=\"https:\/\/web-platform-tests.org\/tools\/wptserve\/docs\/index.html\">Web Platform Test Server (wptserve)<\/a> often intends to use byte sequences. To overcome this mismatch hurdle, we need&nbsp; to either always use <code><em>byte<\/em><\/code> sequences or always use <code><em>str<\/em><\/code>.&nbsp; <a href=\"https:\/\/github.com\/web-platform-tests\/rfcs\/blob\/master\/rfcs\/wptserve_py3.md\">[RFC49]<\/a> has illustrated pros and cons for both approaches. It was decided within the community to go the byte sequence path in order to keep a consistent and semantically correct encoding model. That is to <strong>always use byte sequences: <code><em>str<\/em><\/code> in Python 2 and <code><em>bytes<\/em><\/code> in Python 3<\/strong>. This had incurred some noticeable changes in WPT core. In <code><em>wptserve<\/em><\/code>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>It introduced a pair of <code><em>ISO-8859-1<\/em><\/code> encode and decode helper functions. Both of them can accept either binary or text strings, but always return binary\/text strings respectively regardless of the Python version.<\/li><li>Most public APIs for custom handlers can only accept and return binary with notable exception of the response body.<\/li><\/ul>\n\n\n\n\n\nIn <a href=\"https:\/\/web-platform-tests.org\/writing-tests\/python-handlers\/index.html?highlight=file%20handler\">python file handlers<\/a>, it has specified string types for Headers on both requests and responses, Request URL\/form parameters and response bodies etc..\n\n\n\n\n\nAfter the necessary changes in the core part were done, Robert Ma (@robertma) and Stephen Mcgruer (<em>@smcgruer<\/em>) from Google created\u00a0 <a href=\"https:\/\/docs.google.com\/document\/d\/1y22a4s6xmHNug5dmFxxHmyq7WbFht09c9QFgCPTiR_Q\/edit\">the porting guidelines<\/a>. Based on the guideline, we re-examined pretty much every line of the\u00a0 test scripts in the existing handlers to add prefixes to string literals when necessary.\n\n\n\n\n\nHere we\u2019d like to walk through some examples on porting handler related tests following the guideline and hope to share some tips.\n\n\n\n\n\n<h3 class=\"wp-block-heading\">Writing Python 3 compatible tests<\/h3>\n\n\n\n\n\nAccording to the guideline, rule of thumb for porting is to make sure all strings are either always text or always bytes; all string literals in handlers should be prefixed with <code><em>\"b\"<\/em><\/code> or <code><em>\"u\"<\/em><\/code>.\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Headers of request and response<\/h4>\n\n\n\n\n\nHeader data should always be binary strings for both keys and values. Prefer adding <code><em>\"b\"<\/em><\/code> prefixes to encoding\/decoding.\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>The <em>Request.headers<\/em> dictionary-like interface (accessed via <em>[&#8230;], get, items<\/em>). <\/li><\/ul>\n\n\n\n\n\n<code><em>headers = [(b\"Content-Type\", b\"text\/html\")]<br> if b\"allow_csp_from\" in request.GET:<br> headers.append((b\"Allow-CSP-From\", request.GET[b\"allow_csp_from\"]))<\/em><\/code>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>The <em>Request.headers.get_list<\/em> method example:<\/li><\/ul>\n\n\n\n\n\n<em><code>assert isinstance(headers.get_list(b'x-bar')[0], bytes)<\/code><\/em>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li><em>Response.headers.{get,set,append,update,items}<\/em> examples:<\/li><\/ul>\n\n\n\n\n\n<code><em>response.headers.set(b'Access-Control-Allow-Origin', request.headers.get(b\"origin\"))<br>response.headers.append(b\"Access-Control-Allow-Origin\", b\"*\")<\/em><\/code>\n\n\n\n\n\n<h4 class=\"wp-block-heading\">HTTP Basic Authentication<\/h4>\n\n\n\n\n\n<em>Request.auth.{username,password}<\/em> are binary strings. For example,\n\n\n\n\n\n<em><code>response.headers.set(b'Access-Control-Allow-Origin', request.headers.get(b\"origin\"))<br>response.headers.append(b\"Access-Control-Allow-Origin\", b\"*\")<br>response.headers.set(b'Content-type', b'text\/plain')<br>content = b\"\"<\/code><\/em>\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Cookies<\/h4>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li><em>Request.cookies<\/em> (similar to <em>Request.headers<\/em>; it\u2019s a MultiDict with all APIs of dict plus first, last, get_list). For example,<\/li><\/ul>\n\n\n\n\n\n<code><em>response.content = request.cookies[b\"foo\"].value<\/em><\/code>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li><em>Response.{set,unset,delete}_cookie<\/em>.<\/li><\/ul>\n\n\n\n\n\n<code><em>response.set_cookie(b\"name\", b\"value\")<br>response.unset_cookie(b\"name\")<\/em><\/code>\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Request URL\/form parameters<\/h4>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>Both the keys and values of URL\/form parameters for the request (accessible via <em>request.GET<\/em> or <em>request.POST<\/em>) are all binary strings. Prefer adding <em>&#8220;b&#8221;<\/em> prefixes to encoding\/decoding.<\/li><\/ul>\n\n\n\n\n\n<code><em>b\"realm\" in request.POST<br> request.GET.first(b\"type\", None) == b\"value\"<\/em><\/code>\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Response Status Message<\/h4>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li>Response status message is binary string as follows. <\/li><\/ul>\n\n\n\n\n\n<code><em>response.status = 401<br>response.headers.set(b'Status', b'401 Authorization required')<br>response.headers.set(b'WWW-Authenticate', b'Basic realm=\"test\"')<\/em><\/code>\n\n\n\n\n\n<h4 class=\"wp-block-heading\">Response body<\/h4>\n\n\n\n\n\nThe data put into the response body can be either text or binary strings, but the two types should never be mixed and string literals must be prefixed.\n\n\n\n\n\n<em><code>response.writer.write(b\"This is a body!\")<br> return u\u201dHello, \u4e16\u754c!\u201d<\/code><\/em>\n\n\n\n\n\n<h2 class=\"wp-block-heading\">Status<\/h2>\n\n\n\n\n\nWPT successfully moved to the \u201cPy3-first\u201d stage in December 2020. In February 2021 it dropped all Python 2 tests from CI, and started accepting Python 3-only changes.\n\n","protected":false},"excerpt":{"rendered":"<p>In 2020, Igalia was involved in the Python 3 migration work for the web-platform-tests (WPT) project with sponsorships from Google. After a year-long effort, in December 2020 the flag for python 3 was switched on in WPT. Now over a year on, I only just manage to write about this migration work.&nbsp; Better late than &hellip; <a href=\"https:\/\/blogs.igalia.com\/zsun\/2022\/01\/21\/wpt-python-3-migration\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;WPT Python 3 Migration&#8221;<\/span><\/a><\/p>\n","protected":false},"author":64,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[4,3,2],"class_list":["post-4","post","type-post","status-publish","format-standard","hentry","category-wpt","tag-migration","tag-python","tag-wpt"],"_links":{"self":[{"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/posts\/4","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/users\/64"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/comments?post=4"}],"version-history":[{"count":135,"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/posts\/4\/revisions"}],"predecessor-version":[{"id":154,"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/posts\/4\/revisions\/154"}],"wp:attachment":[{"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/media?parent=4"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/categories?post=4"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.igalia.com\/zsun\/wp-json\/wp\/v2\/tags?post=4"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}