Maintaining Chromium downstream: keeping it small
Maintaining a downstream of Chromium is hard, because of the speed upstream moves. and how hard it is to keep our downstream up to date.
A critical aspect is how big what we build on top of Chromium is: in other words, the size of our downstream. In this blog post I will review how to measure it, and the impact it has on the costs of maintaining a downstream.
Maintaining Chromium downstream series #
Last year, I started a series of blog posts about the challenges, the organization and the implementation details of maintaining a project that is a downstream of Chromium. This is the third blog post in the series.
The previous posts were:
- Why downstream?: why is it needed to create downstream forks of Chromium? And why using Chromium in particular?
- Update strategies: when to update? Is it better to merge or rebase? How can automation help?
Measuring the size of a downstream #
But, first… What do I mean by the size of the downstream? I am interested in a definition that can be used as a metric, something we can measure and track. A number that allows to know if the downstream is increasing or decreasing, measure if a change has impact on it.
The rough idea is: the bigger the downstream is, the more complex it is to maintain it. I will provide a few metrics that can be used for this purpose.
Delta #
The most obvious metric is the delta, the difference between upstream Chromium and the downstream. For this, and assuming the downstream uses Git, the definition I use is essentially the result of this command:
git diff --shortstat BASELINE..DOWNSTREAM
BASELINE
is a commit reference that represents the pure upstream repository status our downstream is based on (our baseline). DOWNSTREAM
is the commit reference we want to compare the baseline to.
As a recommendation, it is useful to maintain in our downstream repository tags or branches that represent strictly the baseline. This way we can use diff tools to represent our delta more easily.
This command is going to return 3 values:
- The number of files that have changed.
- The number of lines that were added.
- The number of lines that were removed.
We will be mostly interested in tracking the number of lines added and removed.
This definition is interesting as it gives an idea of the amount of lines of code that we need to maintain. It may not reflect the full amount to maintain, as some files are maintained out of the Chromium repository. Aggregating these with other repositories changed or added to the build could be useful.
One interesting thing with this approach is also that we can measure the delta of specific paths in the repository. I.e. if we want to measure the delta of the content/
path, it is just as easy as doing:
git diff --shortstat BASELINE..DOWNSTREAM content/
Modifying delta #
The regular delta definition we considered has a problem. All the line changes have the same weight. But, when we update our baseline, a big part of the complexity comes from the conflicts found when rebasing or merging.
So, I am introducing a new definition. Modifying delta: the changes between the baseline and the downstream that affect upstream lines. In this case, we ignore completely any file added only by the downstream, as that is not going to create conflicts.
In Git, we can use filters for that purpose:
git diff --diff-filter=CDMR --shortstat BASELINE..DOWNSTREAM
This will only account these changes:
M
: changes affecting existing files.R
: files that were renamed.C
: files that were copied.D
: files that were deleted.
So, these numbers are going to more accurately represent which parts of our delta can conflict with the changes coming from upstream when we rebase or merge.
Tracking the modifying delta, and reorganizing the project to reduce it, is usually a good strategy for reducing maintenance costs.
diffstat #
An issue we have with the Git diff stats is that it represents modified lines as a block of lines removed and another of lines added.
Fortunately, we can use another tool. Diffstat, will do a best effort to identify which lines are actually modified. It can be easily installed in your distribution of choice (i.e. the package diffstat
in Debian/Ubuntu/Redhat).
This behavior is enabled with the parameter -m
:
git diff ...parameters... | diffstat -m
This is the kind of output that is generated. On top of the typical +
and -
we see the !
for the lines that have been detected to be modified.
$ git show | diffstat -m
paint/timing/container_timing.cc | 5 ++++!
paint/timing/container_timing.h | 1 +
timing/performance_container_timing.cc | 20 ++++++++++++++++++!!
timing/performance_container_timing.h | 5 +++++
timing/performance_container_timing.idl | 1 +
timing/window_performance.cc | 4 ++!!
timing/window_performance.h | 1 +
7 files changed, 32 insertions(+), 5 modifications(!)
Coloring is also available, with the parameter -C
.
Using diffstat
gives a more accurate insight of both the total delta and the modifying delta.
Tracking deltas #
Now we have the tools to provide numbers, we can track them in the time to know if our downstream is growing or shrinking.
That can be used also for measuring the impact of different strategies or changes in the downstream maintenance complexity.
Other metric ideas #
But deltas are not the only tool to measure the complexity, specially regarding the effort maintaining a downstream.
I can enumerate just a few ideas that provide insight of different problems:
- Frequency of rebase/merge conflicts per path.
- Frequency of undetected build issues.
- Frequency and complexity of the regressions, weighed by the size of the patches fixing them.
Relevant changes for tracking a downstream #
Let’s focus now on other factors, not always measurable easily, when we maintain a downstream project.
What we build on top of Chromium #
The complexity of a downstream, specially the one measured by regular delta, is impacted heavily by what is built on top of Chromium.
A full web browser is usually bigger, because it includes the required user experience, and many components that conform what we nowadays consider a browser. History, bookmarks, user profiles, secrets management…
An application runtime for hybrid applications may just have minimal wrappers for integrating a web view, but then maybe a complex set of components for easing the integration with a native toolkit or a specific programming language.
How much you build on top of Chrome?
- Browser are usually bigger pieces than runtimes.
- Hybrid application runtimes may have a big part related to toolkit or other components.
What we depend on #
For maintenance complexity, as important as what we build on top, is the set of boundaries and dependencies:
- How many upstream components are we using?
- What kind of APIs are provided?
- Are they stable or changing often?
These questions are specially relevant, as Chromium does not really provide any warranty about the stability, or even availability, of existing components.
Though, different layers provided by Chromium change less often than others. Some examples:
- The Content API provides the basics of the web platform and Chromium process model, so it is quite useful for hybrid application runtimes. It has been changing last years, in part because of the [Onion Soup refactorings] (https://mariospr.org/category/blink-onion-soup/). Though, there are always examples of how to adapt to those changes in
//content/shell
and//chrome
. - Chromium provides a number of reusable components at
//components
that may be useful for different downstreams. - Then, for building a full web browser, it may be tempting to directly use
//chrome
, and modify it for the specific downstream user experience. This means a higher modifying delta. But, as the upstream Chrome browser UI may also often changes heavily, the frequency of conflicts also increases.
Wrapping up #
In this post I reviewed different ways to measure the downstream size, and how what we build impacts the complexity of maintenance.
Understanding and tracking our downstream allows to implement strategies to keep things under control. It also allows to better understand the cost of a specific feature or an implementation approach.
In the next post in this series, I will write about how the upstream Chromium community helps the downstreams.