Skip to content

Determine appropriate timeouts in DNS, HTTP, TLS #1738

@mxsasha

Description

@mxsasha

We have poor information on what current timeouts are, and no information on how they were chosen. Some recent refactorings have at least moved them into clear constants in the code, but those numbers are also just guessed.

This has come up e.g. when they have been accidentally changed in:
#1714 (comment)
#1218 (comment)

Considerations:

  • The timeout for the total test is currently 200 seconds (INTERNETNL_CACHE_TTL). Exceeding this will be reported as a test error, but it's not - the frontend just gave up and the job is still running. This is indistinguishable from an actual error (as in uncaught exception in code execution).
  • There are separate timeouts in resolving, TLS connections, HTTP requests, probably other unknown places.
  • Not every weird case needs to be testable, I think it's fine to just say no reply in 5 seconds means you don't run HTTP or TLS.
  • Multiple connections accumulate this, i.e. if we the HTTP request timeout is 20 seconds, 10 sequential requests will push the test over the global timeout, giving a worse user experience.
  • We should not hyper-optimise for minimal lookups/queries, just to be able to accommodate servers that respond ridiculously slow.

We should decide what appropriate timeouts are and apply them consistently in new code. This will impact some test results. If we consider the current changes a release blocker, that's a blocker for 1.10, as we already have some of this merged.

Metadata

Metadata

Labels

discussRequires further team discussion and decisions

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions