If you evaluate the architecture of next-generation, high-throughput networks — Sui, Aptos, or advanced modular sequencers — you will notice a fundamental shift in how they handle state. They have entirely abandoned the traditional monolithic block-building process.
Instead of a single leader proposing a block of transactions over a synchronous gossip network, these systems utilize protocols inspired by Narwhal, Bullshark, and more recently, Mysticeti. They separate data dissemination from consensus ordering. Nodes continuously stream transaction batches to each other, building a Directed Acyclic Graph (DAG) of causal history entirely in system memory. Consensus is then run purely on the structural metadata (the references within the graph) rather than the payload itself.
From a theoretical standpoint, this solves the leader bottleneck and allows throughput to scale linearly with bandwidth. But as an infrastructure advisor, I don’t look at the theory. I look at the implementation.
When you shift the entire transaction dissemination layer into a continuous, ever-expanding graph structure, you introduce a massive systems engineering hurdle: Memory Management and Concurrent State Traversal. If you misconfigure the garbage collection of a DAG mempool, your 256GB RAM validator will hit an Out-Of-Memory (OOM) panic and crash in minutes.
Here is the engineering teardown of how high-performance DAG mempools manage memory boundaries, the shift toward uncertified graphs, and exactly where the implementations bottleneck under load.
1. The DAG Paradigm: Causal History in RAM
In a traditional blockchain, consensus and data availability are bundled into one linear process. In a DAG-based system, validators do not wait for a block time. They constantly ingest user transactions, package them into “vertices” (or batches), and broadcast them.
When a validator creates a new vertex for the current Round (let’s say, Round $R$), that vertex must include cryptographic pointers to a quorum of vertices from the previous round (Round $R-1$).
In older models like Narwhal, these were Certified DAGs, meaning every vertex required $2f+1$ signatures (a certificate) before it could be referenced, heavily taxing the CPU with signature verification. Bleeding-edge protocols like Mysticeti utilize Uncertified DAGs, completely dropping the reliable broadcast phase. Vertices are simply linked via “strong edges” (references to $f+1$ previous vertices).
This creates a dense, interlocking, and highly complex web of causal history resident entirely in the node’s system RAM. Because the network is asynchronous, different nodes will see different parts of the graph at different times. The node must hold this entire structure in memory so that when the consensus engine “commits” a specific leader vertex, the node can deterministically traverse the graph backward to sequence all the transactions that led to it.
2. The Unbounded Graph and the OOM Threat
Herein lies the architectural danger: a DAG naturally grows unboundedly.
If a network is processing 100,000 transactions per second, it is generating thousands of vertices and tens of thousands of edge connections in memory every few seconds. In the actual node codebase (usually written in Rust), this is represented across three parallel memory structures:
The DAG Map: A massive concurrent hash map keyed by {validator_id, round} pointing to the vertex metadata.The Batch Store: The actual raw transaction payloads referenced by the DAG.The Pending Transactions Map: User transactions waiting to be packed into a vertex.
If the node does not aggressively and safely prune these three structures, the memory heap balloons. A malicious actor can execute an Equivocation DoS Attack. By intentionally broadcasting conflicting uncertified vertices, the attacker forces honest nodes to store multiple branching histories in RAM. Because the protocol cannot immediately discard equivocations without violating liveness, the graph bloats exponentially, triggering the OS OOM killer and crashing the validator.
3. Watermark Garbage Collection (The Implementation Subtlety)
You cannot simply delete a vertex from memory just because the consensus thread successfully ordered it. Because the network is asynchronous, a slower, honest validator might still be constructing a path that references that exact vertex. If you drop it from RAM prematurely, you break the causal history for your peers, effectively acting as a Byzantine (faulty) node.
To safely prune the graph without breaking the network, elite implementations use Watermark Garbage Collection.
The memory manager must track the state across the graph using two strict boundaries:
The High Watermark: The latest round that the consensus engine has successfully committed locally.The Low Watermark: The oldest round that is still needed by the slowest honest validator to maintain liveness.
The node must continuously calculate the progress of $f+1$ honest validators. The mathematical invariant is strict: Never GC below the $f+1$ honest validators’ watermarks. Only when the protocol mathematically guarantees that a vertex falls definitively below the Low Watermark can the memory manager safely unlink the pointers, deallocate the structs from the DAG Map, and hand the memory back to the OS.
4. The Concurrency Nightmare: Traversal vs. Pruning
Even with a perfect watermark algorithm, the physical act of freeing memory introduces a severe multithreading bottleneck.
A DAG mempool is a highly concurrent environment. It is constantly being mutated by ingress threads (adding new vertices from peers), traversed by the consensus thread (running topological sorts to commit transactions), and purged by the GC thread.
If the implementation relies on standard Read-Write locks (RwLock)—for example, locking the entire DAG Map during a GC sweep to prevent data races—the node’s throughput will periodically plummet to zero. When the GC thread takes the write-lock to delete Round $R-50$, the consensus thread is blocked from reading Round $R$, stalling the entire sequencing pipeline.
To survive mainnet traffic, protocols must abandon standard mutexes and implement Lock-Free Concurrent Data Structures combined with advanced memory reclamation strategies like Epoch-Based Reclamation (EBR) or Hazard Pointers (commonly found in high-performance Rust crates like crossbeam-epoch).
These paradigms allow the consensus thread to traverse the graph using atomic pointers without ever blocking. When the GC thread identifies a vertex below the Low Watermark, it unlinks it atomically (Logical Unlink), but delays the actual physical memory deallocation until it can mathematically prove no active consensus thread is currently holding a reference to that specific memory address.
Follow me on X.
Disaggregating the Mempool: The Memory Architecture of DAG Consensus was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.
