Skip to content

Conversation

@GabrielMartinezRodriguez
Copy link
Contributor

@GabrielMartinezRodriguez GabrielMartinezRodriguez commented Feb 25, 2025

Description

This PR aims to solve the issue that Aodhgan found in the randomness service, where every 3 blocks, for one of them, the random() function reverts. This occurs in the testnet but not locally, making it quite difficult to determine what is happening.

The error occurs because the drand value is not found. This happens every 3 blocks because, in every 3-block cycle, one of them has a lower margin of error. Specifically, the time we have to include the drand value is only about 2.3 seconds from its generation, whereas for the other drands, it is 3.3 and 4.3 seconds, respectively. This makes the 2.3-second case the most problematic.

2.3 seconds should be more than enough time to include the drand value. However, I have detected that when a drand value is requested immediately after being emitted, the drand network can take up to 1.6 seconds to return it. This would cause us to exceed the time limit, given that the current timeout is set to 1 second. For this reason, I have increased the timeout to 2 seconds.

  • Include all relevant context (but no need to repeat the issue's content).
  • Draw attention to new, noteworthy & unintuitive elements.
Toggle Checklist

Checklist

Basics

  • B1. I have applied the proper label & proper branch name (e.g. norswap/build-system-caching).
  • B2. This PR is not so big that it should be split & addresses only one concern.
  • B3. The PR targets the lowest branch it can (ideally master).

Reminder: PR review guidelines

Correctness

  • C1. Builds and passes tests.
  • C2. The code is properly parameterized & compatible with different environments (e.g. local,
    testnet, mainnet, standalone wallet, ...).
  • C3. I have manually tested my changes & connected features.

< INDICATE BROWSER, DEMO APP & OTHER ENV DETAILS USED FOR TESTING HERE >

< INDICATE TESTED SCENARIOS (USER INTERFACE INTERACTION, CODE FLOWS) HERE >

  • C4. I have performed a thorough self-review of my code after submitting the PR,
    and have updated the code & comments accordingly.

Architecture & Documentation

  • D1. I made it easy to reason locally about the code, by (1) using proper abstraction boundaries,
    (2) commenting these boundaries correctly, (3) adding inline comments for context when needed.
  • D2. All public-facing APIs & meaningful (non-local) internal APIs are properly documented in code
    comments.
  • D3. If appropriate, the general architecture of the code is documented in a code comment or
    in a Markdown document.
  • D4. An appropriate Changeset has been generated (and committed) for changes that touch npm published packages (currently pacakges/core and packages/react), see here for more info.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 25, 2025

Deploying happychain with  Cloudflare Pages  Cloudflare Pages

Latest commit: 4ffb47a
Status: ✅  Deploy successful!
Preview URL: https://8f0f84df.happychain.pages.dev
Branch Preview URL: https://gabriel-delay-post-drand.happychain.pages.dev

View logs

@GabrielMartinezRodriguez GabrielMartinezRodriguez marked this pull request as ready for review February 25, 2025 15:16
@GabrielMartinezRodriguez GabrielMartinezRodriguez mentioned this pull request Feb 25, 2025
11 tasks
Copy link
Contributor Author

GabrielMartinezRodriguez commented Feb 25, 2025

@GabrielMartinezRodriguez GabrielMartinezRodriguez marked this pull request as draft February 25, 2025 15:17
@GabrielMartinezRodriguez GabrielMartinezRodriguez added the no-merge For showcase, not to be merged label Feb 25, 2025
Base automatically changed from gabriel/transaction-reorder to master February 26, 2025 11:03
@GabrielMartinezRodriguez GabrielMartinezRodriguez added reviewing-1 Ready for, or undergoing first-line review and removed no-merge For showcase, not to be merged labels Mar 6, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez changed the title fix(randomness): reduce sleep time when too early getting drand number Fix: random() reverts once every 3 blocks Mar 6, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez added draft Not ready for review and removed draft Not ready for review reviewing-1 Ready for, or undergoing first-line review labels Mar 6, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez marked this pull request as ready for review March 12, 2025 13:53
@GabrielMartinezRodriguez GabrielMartinezRodriguez added reviewing-1 Ready for, or undergoing first-line review and removed draft Not ready for review labels Mar 12, 2025
}
const maxRetries = 10
const retryIntervalMs = 100
let retryCount = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to retry? Since handleNewDrandBeacons is called every 3s, and the timeout is 2s, this will be called the next time, right? At most it could make sense to retry once with a lower timeout of 1s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think you are right; retrying is not necessary since it will be retried in a few seconds. However, the timeout has to be 2000ms because sometimes when we request a drand, if we request it too early, the query gets delayed by around 1600ms. If we limit the timeout to 1 second, it will fail. That was the reason we were failing to obtain the drand on time in some blocks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also modified the fetch so that it doesn't stop an already fired fetch and has an absolute timeout. I think this is exactly what we want because, otherwise, for example, we were making two attempts with a 2000-millisecond timeout, meaning we could wait up to 4 seconds. But what happens if the request arrives after 2100 milliseconds? I think it's better to establish a short timeout period to trigger another fetch without canceling the previous one. So now, we retry after one second, but if the previous fetch completes after 1500 milliseconds, we handle it correctly as well

@norswap norswap added question Something has to be cleared up after review and removed reviewing-1 Ready for, or undergoing first-line review labels Mar 14, 2025
@GabrielMartinezRodriguez GabrielMartinezRodriguez added reviewing-2 Ready for, or undergoing final review and removed question Something has to be cleared up after review labels Mar 14, 2025
Copy link
Collaborator

@norswap norswap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is very neat!

@norswap norswap merged commit 89ecd5f into master Mar 14, 2025
3 checks passed
@norswap norswap deleted the gabriel/delay-post-drand branch March 14, 2025 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewing-2 Ready for, or undergoing final review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants