{"id":7,"date":"2019-08-26T07:59:48","date_gmt":"2019-08-26T07:59:48","guid":{"rendered":"http:\/\/blogs.igalia.com\/mshin\/?p=7"},"modified":"2019-08-26T07:59:48","modified_gmt":"2019-08-26T07:59:48","slug":"why-should-we-use-a-precise-width-integer-type","status":"publish","type":"post","link":"https:\/\/blogs.igalia.com\/mshin\/2019\/08\/26\/why-should-we-use-a-precise-width-integer-type\/","title":{"rendered":"Why should we use a precise-width integer type?"},"content":{"rendered":"<p><strong>Introduction<\/strong><br \/>\nA few months ago, I&#8217;ve finished the task of replacing imprecise-width integer types like <em>(unsigned) short, int, long, long long<\/em> by precise-width integer types like <em>(u)int16\/int32\/int64_t<\/em> in Blink.. I&#8217;ve been working on <a href=\"https:\/\/docs.google.com\/document\/d\/13XsbaBz7A2H0PZIdFcytHf5-fVOlAfkLlIUKhxKzs44\/\">the Onion Soup project<\/a> and it was my first task of this project. When I took this task, I thought it would be comparatively simple and quite mechanical work, but this was not the case. It was sometimes required to understand whole code flow in-depth even out of blink or to explain the benefit of such changes to reviewers.<\/p>\n<p>I&#8217;d like to introduce why we should prefer using the precise-width integer type and what I did for this task.<\/p>\n<p><strong>Why should we use the precise-width integer type?<\/strong><br \/>\nWe should keep in mind the fact that the c++ standards don\u2019t specify the specific size of each integer type.<\/p>\n<p>Data Model Table<\/p>\n<table>\n<thead>\n<tr>\n<th>Data model<\/th>\n<th>sizeof(int)<\/th>\n<th>sizeof(long)<\/th>\n<th>sizeof(long long)<\/th>\n<th>example<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>LP32<\/td>\n<td>16b<\/td>\n<td>32b<\/td>\n<td>64b<\/td>\n<td>Win16<\/td>\n<\/tr>\n<tr>\n<td>ILP32<\/td>\n<td>32b<\/td>\n<td>32b<\/td>\n<td>64b<\/td>\n<td>Win32, i386 OSX &amp; Linux<\/td>\n<\/tr>\n<tr>\n<td>LP64<\/td>\n<td>32b<\/td>\n<td>64b<\/td>\n<td>64b<\/td>\n<td>x86-64 OSX &amp; Linux<\/td>\n<\/tr>\n<tr>\n<td>LLP64<\/td>\n<td>32b<\/td>\n<td>32b<\/td>\n<td>64b<\/td>\n<td>Win64<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Chromium supports various platforms with one repository and shows definitely the variable of different sizes with <em>(unsigned) long<\/em> between Android-Kitkat\/MacOS\/Win7 buildbots and other buildbots for try-bots. It means that it would have potential security issues like the stack overflow if we treat that <em>long<\/em> has 64 bits precise since the size of <em>long<\/em> is 64 bits like <em>long long<\/em> on almost all of build systems. And we should avoid mixing the use of the precise-width integer type and the imprecise-width integer type to store the same value.<\/p>\n<p>Actually, after working on this I got a report mentioning that my changes to move from imprecise-width to precise-width types fixed a security issue. This made me realize that more existing issues could still be fixed this way, and I started fixing other ones reported via ClusterFuzz after that.<\/p>\n<p>Google also recommends this in its own coding styleguide, see <a href=\"https:\/\/google.github.io\/styleguide\/cppguide.html#Integer_Types\">here<\/a>.<\/p>\n<p><strong>Summary of my contributions in Blink<\/strong><br \/>\n1) Decided the order of the changing type.<\/p>\n<pre>\nunsigned short -&gt; uint16_t \/ short -&gt; int16_t\nunsigned long long -&gt; uint64_t \/ long long -&gt; int64_t\nunsigned long -&gt; uint32_t or uint64_t, long -&gt; int32_t or int64_t<\/pre>\n<p>2) Wrote the patch including only related codes.<\/p>\n<p>3)  Found what is the proper type.<br \/>\nWhich type is proper between <em>uint32_t<\/em> and <em>uint64_t<\/em> when I should change <em>unsigned long<\/em>?<br \/>\nEvery time I asked myself the question since I had to consider the fact that <em>unsigned long<\/em> was 32 bit on Mac, Android-kitkat and Window7 buildbot and was 64 bit on others so far. So I needed to understand its code flow and the use cases not only in Blink but also out of Blink. In general, it would be best to avoid integer-width conversions where possible and to only convert to larger types when needed.<\/p>\n<p>4) Utilized the template instead of the functions for each type<br \/>\nI reduced the code quantity with the template and avoided the build error by duplicated definition.<\/p>\n<p>5) Used <em>base::Optional<\/em> when the initial value is ambiguous.<br \/>\nIn general the variable&#8217;s initial value would be <em>0<\/em> or <em>-1<\/em>, but <em>base::Optional<\/em> is more readable and simple than comparing <em>0<\/em> or <em>-1<\/em> to check the initial value. Actually, I needed to reach a consensus with the review since it could change the code logic.<\/p>\n<pre><code>bool HasValidSnapshotMetadata() const { return snapshot_size_ &gt;= 0; }\nlong long snapshot_size_;\n<\/code><\/pre>\n<p>Instead,<\/p>\n<pre><code>bool HasValidSnapshotMetadata() const { return snapshot_size_.has_value(); }\nbase::Optional&lt;uint64_t&gt; snapshot_size_;\n<\/code><\/pre>\n<p>6) Used <em>CheckedNumeric<\/em> to check the overflow of the variable.<\/p>\n<pre><code>static bool SizeCalculationMayOverflow(unsigned width,\n                                         unsigned height,\n                                         unsigned decoded_bytes_per_pixel) {\n    unsigned long long total_size = static_cast&lt;unsigned long long&gt;(width) *\n                                    static_cast&lt;unsigned long long&gt;(height);\n    if (decoded_bytes_per_pixel == 4)\n      return total_size &gt; ((1 &lt;&lt; 29) - 1);\n    return total_size &gt; ((1 &lt;&lt; 28) - 1);\n}\n<\/code><\/pre>\n<p>The above code was the old one before I changed and it was for checking if the variable has the overflow, I took some time to fully understand it.<br \/>\nI&#8217;ve used <em>CheckedNumeric<\/em> since the reviewer let me know this utility.  As you can see from the code below,  it has been changed to be simple and readable.<\/p>\n<pre><code>  inline bool SizeCalculationMayOverflow(unsigned width,\n                                         unsigned height,\n                                         unsigned decoded_bytes_per_pixel) {\n    base::CheckedNumeric&lt;int32_t&gt; total_size = width;\n    total_size *= height;\n    total_size *= decoded_bytes_per_pixel;\n    return !total_size.IsValid();\n  }\n<\/code><\/pre>\n<p>7) Oops! MSVC!<br \/>\nThe below code shows the different results due to compilers.<\/p>\n<pre><code>class Foo {\n    uint16_t a;\n    unsigned b : 16;\n};\n<\/code><\/pre>\n<pre><code>std::cout &lt;&lt; sizeof(Foo);\n<\/code><\/pre>\n<p>The results are from <strong>clang<\/strong> : 4 bytes, <strong>gcc<\/strong> : 4 bytes and <strong>msvc<\/strong> : 8 bytes<\/p>\n<p>So, it caused the build break with MSVC toolchain on Windows buildbots as the class size was increased. It took a long time to analyze the cause and solve this issue.<\/p>\n<p>In conclusion, it was very simple to fix. See the code below.<\/p>\n<pre><code>class Foo {\n    unsigned a : 16;\n    unsigned b : 16;\n};\n<\/code><\/pre>\n<p>8) Brought out the potential issue with  with the overflow.<br \/>\nBlink can handle the width and height within 4 bytes range. The old code showed the abnormal layout result with overflowed width\/height but it did not crash. I added the assertion code to check if there is the overflow when changing the type. Finally, we prevent the overflow and solve to layout the object&#8217;s position and size within the maximum value of 4 bytes.<\/p>\n<p>9) Added PRESUBMIT check.<br \/>\nNow that we cleaned up the Blink code from the imprecise-width types, a PRESUBMIT hook was added to prevent them from being reintroduced to the sourcebase. So I added the PRESUBMIT check to ensure new uses do not come in again.<\/p>\n<p>10) Why do we still use <em>int<\/em> and <em>unsigned<\/em>?<br \/>\nThis is the comment of the reviewer and I also agree with it.<\/p>\n<blockquote><p>\n  <a href=\"https:\/\/google.github.io\/styleguide\/cppguide.html#Integer_Types\">Thee C++ style guide<\/a> states &quot;We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don&#039;t assume that it has more than 32 bits. If you need a 64-bit integer type, use <em>int64_t<\/em> or <em>uint64_t<\/em>.&quot;<br \/>\n  The style guide does state not to use the imprecise-width integers like long long\/long\/short, but int is the imprecise-width type that is explicitly preferred. Among other reasons, this is likely for readability and the fact that ints are by default the &quot;native&quot;\/fastest type,  and potentially large concerns over code churn. More there is not any benefit to replace int to <em>int32t<\/em> and potentially large concerns over code churn.\n<\/p><\/blockquote>\n<p>I&#8217;m happy to contribute to Chromium &amp; Blink with more stable code through changing to the precise-width integer type and to learn more about this particular domain.<\/p>\n<p><strong>References<\/strong><br \/>\n[1] https:\/\/en.cppreference.com\/w\/cpp\/language\/types<br \/>\n[2] http:\/\/nickdesaulniers.github.io\/blog\/2016\/05\/30\/data-models-and-word-size\/<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction A few months ago, I&#8217;ve finished the task of replacing imprecise-width integer types like (unsigned) short, int, long, long long by precise-width integer types like (u)int16\/int32\/int64_t in Blink.. I&#8217;ve been working on the Onion Soup project and it was my first task of this project. When I took this task, I thought it would &hellip; <a href=\"https:\/\/blogs.igalia.com\/mshin\/2019\/08\/26\/why-should-we-use-a-precise-width-integer-type\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Why should we use a precise-width integer type?<\/span><\/a><\/p>\n","protected":false},"author":58,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,3],"tags":[5,8,4,6,7],"class_list":["post-7","post","type-post","status-publish","format-standard","hentry","category-chromium","category-igalia","tag-blink","tag-c","tag-chromium","tag-igalia","tag-precise-width-integer"],"_links":{"self":[{"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/posts\/7","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/users\/58"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/comments?post=7"}],"version-history":[{"count":40,"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/posts\/7\/revisions"}],"predecessor-version":[{"id":56,"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/posts\/7\/revisions\/56"}],"wp:attachment":[{"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/media?parent=7"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/categories?post=7"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.igalia.com\/mshin\/wp-json\/wp\/v2\/tags?post=7"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}