Skip to content

Conversation

@kevinmingtarja
Copy link
Collaborator

@kevinmingtarja kevinmingtarja commented Oct 25, 2025

In set_request_succeeded, I noticed we're spending a good amount of time doing json.dumps on the return value, which makes sense, given that it could reach the order of 10s of MBs (see #7708) for a simple /status request when you have thousands of clusters.

Screenshot 2025-10-25 at 12 09 35 PM

We could also see that set_request_succeeded takes a non-trivial amount of time, in proportion to the actual work:
Screenshot 2025-10-25 at 12 19 53 PM

This PR switched to using orjson for encoding/decoding these things, which is faster than the stdlib json.

While we're at it, apparently FastAPI also supports using orjson for encoding/decoding the response body as a whole, so let's do that too: https://fastapi.tiangolo.com/advanced/custom-response/#use-orjsonresponse

Benchmarks:

  • Remote API server, us-west1
  • GCP Cloud SQL Postgres, us-west1
  • 2k clusters
  • 1000 sky status calls, 20 concurrent client processes

Before:

            Mean        Std.Dev.    Min         Median      P90         P99         Max
real        9.422       2.290       4.164       9.082       12.002      18.770      20.773

After:

            Mean        Std.Dev.    Min         Median      P90         P99         Max
real        9.170       2.185       4.157       8.865       11.717      17.137      19.998

Some improvements on the tail latency.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test --kubernetes -k kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/smoke-test --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

kevinmingtarja commented Oct 27, 2025

/quicktest-core
/quicktest-core --kubernetes -> Failures unrelated, fixed by #7744

Copy link
Collaborator

@aylei aylei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! @kevinmingtarja

@aylei
Copy link
Collaborator

aylei commented Oct 27, 2025

Not part of this PR, but looks like it would pay-off if we make /status without refresh a synced handler and have a separate /refresh handler to run refresh in background workers.

@kyuds
Copy link
Collaborator

kyuds commented Oct 27, 2025

/quicktest-core --kubernetes

@kevinmingtarja
Copy link
Collaborator Author

/quicktest-core --kubernetes

@kevinmingtarja kevinmingtarja merged commit a2f0669 into master Oct 27, 2025
21 checks passed
@kevinmingtarja kevinmingtarja deleted the orjson branch October 27, 2025 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants