Eliminating redundant RLP encoding in trie (PR #2353)

Summary

This PR addresses a performance bottleneck in our Merkle-Patricia Trie implementation caused by redundant RLP (Recursive Length Prefix) encoding operations. By refactoring the encoding logic to avoid double-encoding, we achieved 1.5x–2x faster trie insertions for workloads of 1k and 10k nodes.

Context

The trie structure is critical for Ethereum state management, and its performance directly impacts node synchronization and block processing times. RLP encoding is used extensively to serialize trie nodes for hashing and storage. Profiling revealed that a significant portion of insertion time was spent on unnecessary RLP re-encoding.

Tools & Methodology

Profiling Tools

Samply:
- Sampled CPU usage during trie insertions, identifying RLP encoding as a hotspot (15% of runtime).
- Generated flamegraph showing encode_to_vec taking a considerable amount of time.
Cargo Bench:
- Provided reproducible benchmarks for trie insertions (1k/10k nodes).

Benchmark Setup

Code Location: Benchmarks are defined in crates/common/trie/benches/trie_bench.rs.

How to Run:

# Navigate to the trie crate
cd crates/common/trie

# Run benchmarks (includes warmup and simple statistics with cargo-bench)
make bench

Hardware

Apple M3 Max: 14-core CPU, 36GB RAM, 1TB SSD.

Benchmark Results

Workload	Baseline (ms)	After PR #2353 (ms)
Insert 1k Nodes	9.96 – 10.02	5.78 – 5.82
Insert 10k Nodes	111.2 – 111.6	67.6 – 68.6

Technical Breakdown

Issue: Redundant RLP Encoding

The original code calculated the length of an RLP-encoded sequence by:

Encoding the entire structure to a temporary buffer.
Measuring the buffer's length.
Re-encoding the structure with the length prefix.

This resulted in two full encoding passes for each time we needed to RLP-encode a structure.

Fix: Single-Pass Encoding

We replaced the double-encoding logic with a single pass, and then appending the length as the size of the encoded buffer (number of bytes).

Keyboard shortcuts

Ethrex