[Feat] Chunked Message #4055

ljedrz · 2025-12-26T14:29:58Z

This PR is a proposal for chunked network messages, specifically the Message (for now - it could be extended to the Event as well).

The way this works is roughly as follows:

Router::send is called
the given Message is check for chunking criteria based on its variant (this can be fine-tuned further, e.g. based on the number of transactions in a Block, or the number of block locators in a Ping), in order to maximize performance - the worst-case scenario is a single-chunk message, but there are no extra allocations just to determine the serialized size
if the criteria are met, the input Message is chunked, and each chunk gets its own call to Router::send

And when a chunk is received:

if it's a single-chunk Message, it immediately gets deserialized into its true form
otherwise, a new map of peers to their chunked messages (in the Router) is consulted in order to either store the chunk, or finalize the Message based on prior chunks plus the new one

There are 3 TODOs left:

pick a new MAXIMUM_MESSAGE_SIZE
decide which Message variants should undergo chunking
decide what constitutes a protocol violation wrt some edge cases, e.g. how many chunked messages a peer could reasonably be sending at any given time, or how long chunks can remain pending (incomplete)

I've tested it in a local devnet and encountered no issues.

Signed-off-by: ljedrz <[email protected]>

vicsn

Really appreciate the work here. In hindsight, a design document first might have been more efficient. But then again seeing all the specific code is very helpful.

I fail to see two benefits of this chunking:

it may speed up transmission of large objects, but benchmarks we ran showed requesting 5 blocks at a time was faster than asking single blocks at a time. I guess these are trade-offs with a lot of factors and there's no general rule given our complex codebase.
we may make it harder to DoS. But actually for most messages/events we can just set a much lower message/event-specific size. The one and only exception is BlockResponse, for which the node is always in control of how much it is requesting, so there's no DoS vector if it's designed right?

vicsn · 2025-12-26T21:49:16Z

node/router/messages/src/helpers/codec.rs


 /// The maximum size of a message that can be transmitted in the network.
-pub(crate) const MAXIMUM_MESSAGE_SIZE: usize = 128 * 1024 * 1024; // 128 MiB
+// TODO: with message chunking in place, it can be greatly reduced.


What we surely need is message/event-specific limits. For the ones with a compile-time sizeof you could already set it a sensible lower value. We could also set consts whose correctness is checked in a unit test...

node/router/src/lib.rs

ljedrz · 2025-12-29T09:08:07Z

we may make it harder to DoS. But actually for most messages/events we can just set a much lower message/event-specific size.

We can't really do that (at least not as quickly as we'd like - only when parsing) - the LengthDelimitedCodec must know in advance what the length of the entire Message is, and it isn't capable of distinguishing between them. What we could do instead is have multiple TCP streams per peer, with one codec for "control-style" messages, and the other for "data-style" messages. This would require some serious refactoring, but could be of interest in the long run (for validators it could even be 3 streams, with 1 dedicated to consensus messages only).

The one and only exception is BlockResponse, for which the node is always in control of how much it is requesting, so there's no DoS vector if it's designed right?

The network message framing we're using is length-based, so as it stands now, someone might declare that they want to send a Message (or multiple) of MAXIMUM_MESSAGE_SIZE, and until they are parsed, they fully occupy the associated buffers. We can only determine whether we've requested the given BlockResponse afterwards, too. In addition, we process block responses concurrently, which slightly increases the viability of a DoS. That vector is still quite unlikely, as the node will immediately disconnect from a peer who forged a message or supplied an unsolicited block.

The true benefits of message chunking is it can reduce (but not remove completely) head-of-line blocking, allowing smaller messages to be sent in between the chunks of a large message; this means that a node that is currently providing blocks to a syncing peer can be more responsive in the meantime (the same applies to the recipient, but they would be less useful in the network until fully synced). This solution is complementary to the aforementioned multi-stream approach (which is a greater mitigation for HoL blocking).

vicsn · 2025-12-29T19:01:51Z

Thank you for explaining.

To defend against the attack scenario of spam from large BlockResponses, which can be literally 1000x larger than any other message, maybe you can sketch out different approaches from least to most hacky. If we tackle that, we may be able to lower the message limit to ~100KB and we may in turn not need the chunking.

ljedrz · 2025-12-30T09:11:34Z

I've given it some thought, and what I'd consider to be the best (fastest, least hacky, most flexible) approach is short-lived sync streams, which would roughly work as follows:

node A realizes that it's behind its peers
node A sends out requests for blocks to the relevant peers
the peers open new TcpStreams dedicated to syncing to node A; these streams are wrapped in their own Framed instance with a codec which only handles blocks or their components (while the codecs and objects related to non-sync work no longer need those functionalities, and can have their maximum message size greatly reduced)
node A, now expecting new streams from specific peers, accepts them without any handshake and immediately starts reading block responses, which no longer have to be chunked or bundled
the peers close their streams as soon as they've written all the requested blocks to them
node A cleans up its side of the streams once all the blocks have been read from them
node A starts/resumes work as a fully-synced member of the network

Benefits:

clean separation of concerns - BlockSync can fully "own" the entire syncing process, without having to consult the Router or the Gateway
no head-of-line blocking for anything other than blocks - the node remains perfectly responsive while syncing
the low-level network buffers and queues for the "basic" streams can be a lot smaller, reducing the memory footprint
the low-level sent/received statistics for peers no longer have any meaningful outliers, and can be used for simple and trustworthy spam detection/responsiveness heuristics
the block-related codec can be fine-tuned to aggressively parse block components individually, so that a DoS via large forged blocks becomes basically impossible

vicsn · 2025-12-30T13:43:26Z

Go forth and proof of concept in a new PR ljedrzGPT!

ljedrz added 2 commits December 26, 2025 15:16

feat: allow the Message to be chunked

9abfa13

Signed-off-by: ljedrz <[email protected]>

chore: adjust the lockfile

17dae08

Signed-off-by: ljedrz <[email protected]>

ljedrz requested a review from vicsn December 26, 2025 14:29

vicsn requested changes Dec 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feat] Chunked Message #4055

[Feat] Chunked Message #4055

Uh oh!

ljedrz commented Dec 26, 2025

Uh oh!

vicsn left a comment

Uh oh!

vicsn Dec 26, 2025

Uh oh!

Uh oh!

ljedrz commented Dec 29, 2025

Uh oh!

vicsn commented Dec 29, 2025

Uh oh!

ljedrz commented Dec 30, 2025

Uh oh!

vicsn commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feat] Chunked Message #4055

Are you sure you want to change the base?

[Feat] Chunked Message #4055

Uh oh!

Conversation

ljedrz commented Dec 26, 2025

Uh oh!

vicsn left a comment

Choose a reason for hiding this comment

Uh oh!

vicsn Dec 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ljedrz commented Dec 29, 2025

Uh oh!

vicsn commented Dec 29, 2025

Uh oh!

ljedrz commented Dec 30, 2025

Uh oh!

vicsn commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants