{"id":748,"date":"2026-03-11T10:02:28","date_gmt":"2026-03-11T09:02:28","guid":{"rendered":"https:\/\/blogs.igalia.com\/xrcalvar\/?p=748"},"modified":"2026-03-11T10:02:28","modified_gmt":"2026-03-11T09:02:28","slug":"reworking-string-management-in-webkits-gstreamer-code","status":"publish","type":"post","link":"https:\/\/blogs.igalia.com\/xrcalvar\/2026\/03\/11\/reworking-string-management-in-webkits-gstreamer-code\/","title":{"rendered":"Reworking string management in WebKit&#8217;s GStreamer code"},"content":{"rendered":"<p>When you work on WebKit&#8217;s GStreamer code, you end up dealing with strings a lot. GStreamer and GLib APIs use C strings everywhere: element names, property values, caps descriptions, SDP fields&#8230; The list is long. The problem was that we were handling all those strings in a somewhat inconsistent way, and the interaction between C strings and WebKit&#8217;s own string types was not as clean as it should be.<\/p>\n<p>This is why I started a meta-effort (<a href=\"https:\/\/bugs.webkit.org\/show_bug.cgi?id=289787\">Bug 289787<\/a>) to rework how we manage strings in the GStreamer code of WebKit. The outcome is mainly two new classes: <code>CStringView<\/code> and <code>GMallocString<\/code>. In this post I want to explain what they are and why they were needed.<\/p>\n<h2>The problem<\/h2>\n<p>WebKit has its own string types, like <code>WTF::String<\/code> and <code>WTF::CString<\/code>, which are designed around WebKit&#8217;s internal needs. They handle reference counting, different encodings, and all kind of operations. But when you interface with GLib and GStreamer, you receive <code>char*<\/code> pointers from C APIs that are typically UTF-8 encoded and null-terminated. Converting back and forth between these and WebKit strings was often done in ad-hoc ways, sometimes creating unnecessary copies and sometimes not being very explicit about ownership.<\/p>\n<p>Another issue was encoding. WebKit internally works with different encodings (Latin1, UTF-16), but all the strings coming from GLib and GStreamer are UTF-8. There was no compile-time enforcement of this distinction, so it was easy to mix things up.<\/p>\n<p>At the beginning, using <code>StringView<\/code> to handle these C strings was kind of enough. It had a <code>rawCharacters<\/code> method that allowed us to just wrap the C string pointer and work with it, even though there was no way to enforce encoding correctness. It was not ideal, but it worked. Then WebKit began to move towards using spans and that is when the real problems popped up: spans carry a pointer and a size, but the size was not accounting for the null terminator. So when you wrapped a null-terminated C string into a span, the null terminator was left out, and if any code down the line relied on it being there, you had memory issues. This was the moment it became clear that we needed a proper type that understood null termination as a first-class concept.<\/p>\n<h2>CStringView: a non-owning view of null-terminated UTF-8 strings<\/h2>\n<p>My first version of <code>CStringView<\/code> was actually a bigger class. I had added many string operations as member functions: <code>startsWith<\/code>, <code>endsWith<\/code>, <code>contains<\/code>, <code>find<\/code>, case-insensitive comparisons&#8230; It felt natural to me as a developer \u2014 you go to the header of a class and you see everything you can do with it. But during the review of <a href=\"https:\/\/github.com\/WebKit\/WebKit\/pull\/51619\">PR #51619<\/a>, Darin Adler made a very good point: <code>CStringView<\/code> should only handle what makes it special, which is the null termination guarantee. All the other string operations should work on spans, because they do not depend on null termination and should not be duplicated for every string type. If you have a <code>CStringView<\/code> named <code>view<\/code>, you just write <code>contains(view.span(), ...)<\/code> instead of <code>view.contains(...)<\/code>. This way, the same functions work for any span of characters, and <code>CStringView<\/code> stays small and focused. I was initially skeptical, but after writing the code I have to say he was right.<\/p>\n<p>So the final <code>CStringView<\/code> is a lightweight, non-owning view over a null-terminated UTF-8 string. Think of it like <code>std::string_view<\/code> but with two key differences: it guarantees null termination and it works with <code>char8_t<\/code> instead of <code>char<\/code>.<\/p>\n<pre><code>class CStringView final {\n    \/\/ ...\n    static CStringView unsafeFromUTF8(const char* string);\n    static CStringView fromUTF8(std::span&lt;const char8_t&gt; spanWithNullTerminator);\n\n    const char* utf8() const;\n    size_t lengthInBytes() const;\n    std::span&lt;const char8_t&gt; span() const;\n    std::span&lt;const char8_t&gt; spanIncludingNullTerminator() const;\n    bool isEmpty() const;\n    bool isNull() const;\n};<\/code><\/pre>\n<p>Using <code>char8_t<\/code> is important because it prevents mixing encodings at compile time. If you try to pass a <code>char*<\/code> to something expecting <code>char8_t<\/code>, the compiler will complain. This is exactly what we wanted: making encoding mismatches a compilation error rather than a runtime bug. This requirement came from Geoffrey Garen during earlier reviews to ensure we were not mixing Latin1 or UTF-16 with this class. You might have noticed the <code>unsafeFromUTF8<\/code> factory method: the <code>unsafe<\/code> prefix is a WebKit convention for APIs that deal with raw pointers without a known size. Since a bare <code>const char*<\/code> has no size information, we have to trust the caller and compute the length with <code>strlen<\/code>, which is inherently unsafe. The safe counterpart, <code>fromUTF8<\/code>, takes a <code>std::span<\/code> where the size is already known.<\/p>\n<p><code>CStringView<\/code> also has a <code>utf8()<\/code> method that returns a plain <code>const char*<\/code> for interfacing with C APIs that expect it. The class is designed to sit on the stack (heap allocation is forbidden) and it carries no overhead since it is essentially just a span.<\/p>\n<p>The work on making the span-based operations available in <code>StringCommon<\/code> (<a href=\"https:\/\/bugs.webkit.org\/show_bug.cgi?id=299946\">Bug 299946<\/a>) involved improving the templates to support <code>char8_t<\/code> spans, so that operations like <code>startsWith<\/code>, <code>endsWith<\/code>, <code>contains<\/code>, <code>find<\/code> and case-insensitive comparisons work seamlessly. Many of these methods were only templated for <code>Latin1Character<\/code> and needed some tweaking to also accept <code>char8_t<\/code>. On top of that, some of these functions accepted different character types for their different parameters, but we wanted to ensure that <code>char8_t<\/code> was checked at compile time so that it could not be mixed with any other encoding-incompatible type. For example, comparing a UTF-8 code unit with a Latin1 character byte by byte is simply incorrect for non-ASCII characters, so the compiler should prevent you from doing it in the first place. These utility functions can be found in <a href=\"https:\/\/github.com\/WebKit\/WebKit\/blob\/main\/Source\/WTF\/wtf\/text\/StringCommon.h\">StringCommon.h<\/a> and <a href=\"https:\/\/github.com\/WebKit\/WebKit\/blob\/main\/Source\/WTF\/wtf\/StdLibExtras.h\">StdLibExtras.h<\/a>.<\/p>\n<p>After the class was ready, I went through the GStreamer code and increased its use (<a href=\"https:\/\/bugs.webkit.org\/show_bug.cgi?id=299443\">Bug 299443<\/a>). This touched a significant amount of files across the WebRTC, media player, and media stream code, replacing raw <code>const char*<\/code> juggling with proper <code>CStringView<\/code> usage.<\/p>\n<p>If you ever need to convert a <code>CStringView<\/code> into a <code>String<\/code>, for example to supply it to a WebCore API, the proper way is straightforward: <code>String newString = cStringView.span();<\/code>. The <code>String<\/code> constructor that takes a <code>std::span&lt;const char8_t&gt;<\/code> handles the UTF-8 conversion correctly without needing any intermediate wrappers.<\/p>\n<p>For the opposite direction, converting a <code>String<\/code> into a <code>CStringView<\/code>, the path goes through <code>CString<\/code>: you first call <code>.utf8()<\/code> on the <code>String<\/code> to get a <code>CString<\/code> (which performs the encoding conversion), and then wrap it with <code>CStringView::unsafeFromUTF8(cString.data())<\/code>. It is important to keep the <code>CString<\/code> alive as long as the <code>CStringView<\/code> is in use, since <code>CStringView<\/code> does not own the data. That said, this kind of conversion is not common in practice. If you already have a <code>String<\/code>, the logical step is to keep using it all the way to the end of the WebKit API onion and, when you finally need a <code>const char*<\/code> for a C API, just call <code>myString.utf8().data()<\/code>.<\/p>\n<h2>GMallocString: an owning wrapper for GLib-allocated strings<\/h2>\n<p>The second piece was <code>GMallocString<\/code> (<a href=\"https:\/\/bugs.webkit.org\/show_bug.cgi?id=303909\">Bug 303909<\/a>).<\/p>\n<p>Many GLib and GStreamer APIs return newly allocated strings that you must free with <code>g_free()<\/code>. Before <code>GMallocString<\/code>, we had <code>GUniquePtr&lt;char&gt;<\/code> for this, but it was just a raw pointer wrapper with no string-specific functionality. You could not easily compare it, convert it to a WebKit string, or even get its length without calling <code>strlen<\/code> yourself.<\/p>\n<p><code>GMallocString<\/code> solves this by wrapping an owned, g_malloc-allocated, null-terminated UTF-8 string. It can adopt strings in several ways:<\/p>\n<pre><code>\/\/ From a raw char* (takes ownership):\nauto str = GMallocString::unsafeAdoptFromUTF8(g_strdup(\"hello\"));\n\n\/\/ From a GUniquePtr&lt;char&gt;:\nGUniquePtr&lt;char&gt; gstr(g_strdup(\"world\"));\nauto str2 = GMallocString::unsafeAdoptFromUTF8(WTFMove(gstr));\n\n\/\/ Copy from a CStringView:\nGMallocString str3(myCStringView);<\/code><\/pre>\n<p>Once you have a <code>GMallocString<\/code>, you get the same <code>span()<\/code>, <code>utf8()<\/code>, <code>lengthInBytes()<\/code> interface as <code>CStringView<\/code>. You can compare them with <code>==<\/code>, convert to <code>CStringView<\/code> with <code>toCStringView()<\/code>, and use them with <code>safePrintfType<\/code> for safe logging. The class is move-only (no copies), which is the right semantics for an owning wrapper.<\/p>\n<p>The nice thing is that <code>GMallocString<\/code> preserves the original null-terminated C string without performing any copy when adopting, and it frees it with <code>g_free()<\/code> when it goes out of scope. This makes it a perfect fit for the GLib\/GStreamer interop pattern where you receive an allocated string and need to use it for a while before discarding it.<\/p>\n<h2>The bigger picture<\/h2>\n<p>These two classes together cover the two main use cases for C string handling in our code:<\/p>\n<ul>\n<li><strong>You do not own the string<\/strong>: use <code>CStringView<\/code>. No allocations, no copies, just a view. Ideal for parameters, string literals, and temporary references.<\/li>\n<li><strong>You own a GLib-allocated string<\/strong>: use <code>GMallocString<\/code>. It adopts the allocation, provides the same interface, and cleans up when done.<\/li>\n<\/ul>\n<p>Both enforce UTF-8 encoding through <code>char8_t<\/code>, both provide span-based access for interacting with the rest of WTF&#8217;s string infrastructure, and both support equality comparisons with each other and with <code>ASCIILiteral<\/code>.<\/p>\n<p>The end result is that the GStreamer code in WebKit is now more explicit about string ownership and encoding, and there are fewer raw <code>char*<\/code> pointers floating around without clear semantics. I also found and fixed a bug in <code>CString::isEmpty()<\/code> (<a href=\"https:\/\/bugs.webkit.org\/show_bug.cgi?id=303428\">Bug 303428<\/a>) along the way, which was returning <code>size_t<\/code> instead of <code>bool<\/code>.<\/p>\n<p>I think the codebase is in a much better shape now in this regard. There is still work to do, but the foundations are there.<\/p>\n<p>Thanks go to my reviewers Darin Adler, Philippe Normand and Adrian Perez de Castro for their patience and thorough reviews, and to <a href=\"https:\/\/www.igalia.com\">Igalia<\/a> for sponsoring this work.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When you work on WebKit&#8217;s GStreamer code, you end up dealing with strings a lot. GStreamer and GLib APIs use C strings everywhere: element names, property values, caps descriptions, SDP fields&#8230; The list is long. The problem was that we &hellip; <a href=\"https:\/\/blogs.igalia.com\/xrcalvar\/2026\/03\/11\/reworking-string-management-in-webkits-gstreamer-code\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":35,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-748","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/posts\/748","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/comments?post=748"}],"version-history":[{"count":4,"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/posts\/748\/revisions"}],"predecessor-version":[{"id":752,"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/posts\/748\/revisions\/752"}],"wp:attachment":[{"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/media?parent=748"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/categories?post=748"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.igalia.com\/xrcalvar\/wp-json\/wp\/v2\/tags?post=748"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}