Skip to content

Conversation

@ljedrz
Copy link
Collaborator

@ljedrz ljedrz commented Dec 26, 2025

This PR is a proposal for chunked network messages, specifically the Message (for now - it could be extended to the Event as well).

The way this works is roughly as follows:

  • Router::send is called
  • the given Message is check for chunking criteria based on its variant (this can be fine-tuned further, e.g. based on the number of transactions in a Block, or the number of block locators in a Ping), in order to maximize performance - the worst-case scenario is a single-chunk message, but there are no extra allocations just to determine the serialized size
  • if the criteria are met, the input Message is chunked, and each chunk gets its own call to Router::send

And when a chunk is received:

  • if it's a single-chunk Message, it immediately gets deserialized into its true form
  • otherwise, a new map of peers to their chunked messages (in the Router) is consulted in order to either store the chunk, or finalize the Message based on prior chunks plus the new one

There are 3 TODOs left:

  • pick a new MAXIMUM_MESSAGE_SIZE
  • decide which Message variants should undergo chunking
  • decide what constitutes a protocol violation wrt some edge cases, e.g. how many chunked messages a peer could reasonably be sending at any given time, or how long chunks can remain pending (incomplete)

I've tested it in a local devnet and encountered no issues.

@ljedrz ljedrz requested a review from vicsn December 26, 2025 14:29
Copy link
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the work here. In hindsight, a design document first might have been more efficient. But then again seeing all the specific code is very helpful.

I fail to see two benefits of this chunking:

  • it may speed up transmission of large objects, but benchmarks we ran showed requesting 5 blocks at a time was faster than asking single blocks at a time. I guess these are trade-offs with a lot of factors and there's no general rule given our complex codebase.
  • we may make it harder to DoS. But actually for most messages/events we can just set a much lower message/event-specific size. The one and only exception is BlockResponse, for which the node is always in control of how much it is requesting, so there's no DoS vector if it's designed right?


/// The maximum size of a message that can be transmitted in the network.
pub(crate) const MAXIMUM_MESSAGE_SIZE: usize = 128 * 1024 * 1024; // 128 MiB
// TODO: with message chunking in place, it can be greatly reduced.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we surely need is message/event-specific limits. For the ones with a compile-time sizeof you could already set it a sensible lower value. We could also set consts whose correctness is checked in a unit test...

@ljedrz
Copy link
Collaborator Author

ljedrz commented Dec 29, 2025

we may make it harder to DoS. But actually for most messages/events we can just set a much lower message/event-specific size.

We can't really do that (at least not as quickly as we'd like - only when parsing) - the LengthDelimitedCodec must know in advance what the length of the entire Message is, and it isn't capable of distinguishing between them. What we could do instead is have multiple TCP streams per peer, with one codec for "control-style" messages, and the other for "data-style" messages. This would require some serious refactoring, but could be of interest in the long run (for validators it could even be 3 streams, with 1 dedicated to consensus messages only).

The one and only exception is BlockResponse, for which the node is always in control of how much it is requesting, so there's no DoS vector if it's designed right?

The network message framing we're using is length-based, so as it stands now, someone might declare that they want to send a Message (or multiple) of MAXIMUM_MESSAGE_SIZE, and until they are parsed, they fully occupy the associated buffers. We can only determine whether we've requested the given BlockResponse afterwards, too. In addition, we process block responses concurrently, which slightly increases the viability of a DoS. That vector is still quite unlikely, as the node will immediately disconnect from a peer who forged a message or supplied an unsolicited block.

The true benefits of message chunking is it can reduce (but not remove completely) head-of-line blocking, allowing smaller messages to be sent in between the chunks of a large message; this means that a node that is currently providing blocks to a syncing peer can be more responsive in the meantime (the same applies to the recipient, but they would be less useful in the network until fully synced). This solution is complementary to the aforementioned multi-stream approach (which is a greater mitigation for HoL blocking).

@vicsn
Copy link
Collaborator

vicsn commented Dec 29, 2025

Thank you for explaining.

To defend against the attack scenario of spam from large BlockResponses, which can be literally 1000x larger than any other message, maybe you can sketch out different approaches from least to most hacky. If we tackle that, we may be able to lower the message limit to ~100KB and we may in turn not need the chunking.

@ljedrz
Copy link
Collaborator Author

ljedrz commented Dec 30, 2025

I've given it some thought, and what I'd consider to be the best (fastest, least hacky, most flexible) approach is short-lived sync streams, which would roughly work as follows:

  1. node A realizes that it's behind its peers
  2. node A sends out requests for blocks to the relevant peers
  3. the peers open new TcpStreams dedicated to syncing to node A; these streams are wrapped in their own Framed instance with a codec which only handles blocks or their components (while the codecs and objects related to non-sync work no longer need those functionalities, and can have their maximum message size greatly reduced)
  4. node A, now expecting new streams from specific peers, accepts them without any handshake and immediately starts reading block responses, which no longer have to be chunked or bundled
  5. the peers close their streams as soon as they've written all the requested blocks to them
  6. node A cleans up its side of the streams once all the blocks have been read from them
  7. node A starts/resumes work as a fully-synced member of the network

Benefits:

  • clean separation of concerns - BlockSync can fully "own" the entire syncing process, without having to consult the Router or the Gateway
  • no head-of-line blocking for anything other than blocks - the node remains perfectly responsive while syncing
  • the low-level network buffers and queues for the "basic" streams can be a lot smaller, reducing the memory footprint
  • the low-level sent/received statistics for peers no longer have any meaningful outliers, and can be used for simple and trustworthy spam detection/responsiveness heuristics
  • the block-related codec can be fine-tuned to aggressively parse block components individually, so that a DoS via large forged blocks becomes basically impossible

@vicsn
Copy link
Collaborator

vicsn commented Dec 30, 2025

Go forth and proof of concept in a new PR ljedrzGPT!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants