Eliminating redundant RLP encoding in trie (PR #2353)
Summary
This PR addresses a performance bottleneck in our Merkle-Patricia Trie implementation caused by redundant RLP (Recursive Length Prefix) encoding operations. By refactoring the encoding logic to avoid double-encoding, we achieved 1.5x–2x faster trie insertions for workloads of 1k and 10k nodes.
Context
The trie structure is critical for Ethereum state management, and its performance directly impacts node synchronization and block processing times. RLP encoding is used extensively to serialize trie nodes for hashing and storage. Profiling revealed that a significant portion of insertion time was spent on unnecessary RLP re-encoding.
Tools & Methodology
Profiling Tools
- Samply:
- Sampled CPU usage during trie insertions, identifying RLP encoding as a hotspot (15% of runtime).
- Generated flamegraph showing
encode_to_vec
taking a considerable amount of time.
- Cargo Bench:
- Provided reproducible benchmarks for trie insertions (1k/10k nodes).
Benchmark Setup
-
Code Location: Benchmarks are defined in
crates/common/trie/benches/trie_bench.rs
. -
How to Run:
# Navigate to the trie crate cd crates/common/trie # Run benchmarks (includes warmup and simple statistics with cargo-bench) make bench
Hardware
- Apple M3 Max: 14-core CPU, 36GB RAM, 1TB SSD.
Benchmark Results
Workload | Baseline (ms) | After PR #2353 (ms) |
---|---|---|
Insert 1k Nodes | 9.96 – 10.02 | 5.78 – 5.82 |
Insert 10k Nodes | 111.2 – 111.6 | 67.6 – 68.6 |
Technical Breakdown
Issue: Redundant RLP Encoding
The original code calculated the length of an RLP-encoded sequence by:
- Encoding the entire structure to a temporary buffer.
- Measuring the buffer's length.
- Re-encoding the structure with the length prefix.
This resulted in two full encoding passes for each time we needed to RLP-encode a structure.
Fix: Single-Pass Encoding
We replaced the double-encoding logic with a single pass, and then appending the length as the size of the encoded buffer (number of bytes).