Web Languages Team

Est. 2026

Counting Total Allocation for a V8 Isolate

Towards the end of 2025, I had the opportunity to work on a project sponsored by Jane Street to extend V8’s memory profiler API with a new metric: the total number of bytes allocated by an Isolate since its creation.

The requirement sounds simple: count mutator allocations (i.e., allocations caused by JavaScript execution) while ignoring allocations performed by internal machinery, like GC threads. The motivation was to allow building regression tests that monitor allocation behavior over time. If you want more background on the motivation and initial proposal, I recommend the patch’s design document.

What initially looked like a small change turned into a surprisingly deep dive into V8’s allocation fast-paths, LAB management, and safepointing infrastructure.

Starting with AllocationObservers #

My first instinct was: “V8 already has a lot of allocation profiling infrastructure; surely there might be something close to what I need, right?”

That question led me to the SamplingHeapProfiler, which internally uses AllocationObserver. This already tracks allocations and, importantly, it does not observe GC allocations, thanks to PauseAllocationObserverScope inside Heap::PerformGarbageCollection.

So I implemented a prototype: a new AllocationObserver with step_size = 1 that increments a counter on every allocation, then exposed that counter through an added total_allocated_bytes field in HeapStatistics.

This worked, produced correct numbers, and required minimal changes. However, there was one major problem: The performance overhead was catastrophic. The major reason behind such high overhead is that using allocation observers to observe all allocations means that every allocation in the program will need to go through the C++ observer's infrastructure, and this disables every fast path related optimization. I soon realized why SamplingHeapProfiler is opt-in and must be explicitly started (see StartSamplingHeapProfiler). Its observer is attached only at that point. We needed something much lighter than this approach.

Linear Allocation Buffers (LABs) #

One of the fast path related optimizations is LABs. A LAB (Linear Allocation Buffer) is a contiguous region where the mutator can allocate memory using a simple bump-pointer. It's created when a mutator requests some allocation, and instead of only allocating what was requested, it allocates more memory to be used by subsequent mutator allocations. Allocations from this buffer are extremely fast, and a huge amount of JavaScript code that triggers allocation is affected by this fast path, such as function declarations, object literals, arrays, closures, strings, etc.

If an AllocationObserver must run for every allocation, every allocation becomes a slow-path and performance tanks.

This meant that profiling using observers was not acceptable for something that must always be enabled. Still, that prototype was valuable: it served as an oracle to validate later implementations.

The Path to Negligible Overhead #

While discussing alternatives with my colleague Andy Wingo, he suggested a promising idea: "Instead of counting allocations as they happen, count the LAB size when the LAB is freed. For active LABs, we get their size when the API is called."

The intuition for this idea is as follows:

To ignore GC allocations, we used MainAllocator::in_gc.

LAB freeing happens in FreeLinearAllocationAreaUnsynchronized [1] [2].

This approach solved the performance issue, but we needed to account for partially filled LABs when the API is queried. That led to adding a helper like:

uint64_t Space::GetTotalAllocatedBytesInLAA() const {
uint64_t total_bytes = 0;
HeapAllocator* allocator = heap_->allocator();
// Here we check spaces that might have allocations in their current
// LinearAllocationArea that hasn't been freed yet, which means that they
// weren't added to the total allocation counter.
MainAllocator* space_allocator = nullptr;
switch (identity()) {
case NEW_SPACE: {
space_allocator = allocator->new_space_allocator();
break;
}
case OLD_SPACE: {
space_allocator = allocator->old_space_allocator();
break;
}
case TRUSTED_SPACE: {
space_allocator = allocator->trusted_space_allocator();
break;
}
case CODE_SPACE: {
space_allocator = allocator->code_space_allocator();
break;
}
case SHARED_SPACE: {
space_allocator = allocator->shared_space_allocator();
break;
}
case SHARED_TRUSTED_SPACE: {
space_allocator = allocator->shared_trusted_space_allocator();
break;
}
default:
break;
}
if (space_allocator && space_allocator->top() > space_allocator->start()) {
total_bytes += space_allocator->top() - space_allocator->start();
}
return total_bytes;
}

…which scanned each allocator’s active LAB and counted the remaining used bytes. This version passed correctness checks against the observer-based oracle.

Reaching the Final Version #

Before approval, reviewers suggested a couple of changes to simplify the logic:

1. Count allocations when a LAB is created, not freed. #

This lets us place the logic in a single point: MainAllocator::ResetLab.

Counting on LAB creation slightly overestimates total allocation because LABs are not always fully used before being freed. In my experiments, the overestimation was around +1%, which we considered acceptable for this API.

2. Removing Space::GetTotalAllocatedBytesInLAA #

Another suggestion was to avoid scanning active LABs during GetHeapStatistics. Instead, they suggested freeing LABs from all threads before reading the counter, similar to what is done for LABs in main thread.

This required thread synchronization, which led me to investigate...

Safepoints #

V8 uses safepoints to coordinate all threads so the runtime can safely examine the heap while avoiding race conditions.

I experimented with a version using a safepoint inside GetHeapStatistics and implementing the following method:

uint64_t Heap::GetTotalAllocatedBytes() {
uint64_t total_allocated_bytes = 0;
safepoint()->IterateLocalHeaps([&](LocalHeap* local_heap) {
total_allocated_bytes += local_heap->allocator()->GetTotalAllocatedBytes();
});
return total_allocated_bytes;
}

This removed the need for atomic counters, since each thread had its own local counter.

The experiment worked and was interesting, but...

Returning to Atomic Counters #

Despite the safepoint-based approach being correct, reviewers ultimately preferred the atomic counter solution. I'm not entirely clear on why they made this decision, but I'd guess that the simplicity of an atomic counter was the major reason.

Performance evaluation using Speedometer, JetStream, and internal V8 benchmarks (SunSpider, Kraken, Octane) showed no detectable regressions.

This is the version that ultimately landed.

Conclusion #

Even though the initial requirement looked straightforward, implementing this feature required deep knowledge of several parts of V8’s memory subsystem—LABs, fast-path allocation, GC interactions, and safepoints. It was a delightful opportunity to gain a much deeper understanding of how V8 handles allocation at scale.

After the patch landed in V8, I worked on backporting it to Node.js’s HeapStatistics. The feature should already be in Node.js v25.

Acknowledgements #

Thanks to Jane Street for sponsoring this work; to my Igalia colleagues Andy Wingo, Joyee Cheung, and Romulo Cintra for internal discussions and reviews; and to the V8 and Node.js reviewers who devoted their time to improving the patch.