-
Notifications
You must be signed in to change notification settings - Fork 175
[CI] Increase unit test job timeout #8493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Occasionally, unit test job timing out because they time out after first attempt (test time out set to 1800 seconds) and it doesn't give enough time for second attempt to succeed. Issue: 470590965
🤖 Gemini Suggested Commit Message💡 Pro Tips for a Better Commit Message:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request increases the default job timeout for on-device tests to prevent CI failures. The change is simple, but I have two suggestions for improvement. First, the argument type for the timeout should be int instead of str for better type safety. Second, the new timeout value of 2700 seconds may still be insufficient to cover two full test attempts, which could take up to 3600 seconds plus startup time. A more robust timeout value would prevent this issue from recurring.
| default='2700', | ||
| help='Timeout in seconds for the job. Must be set higher and ' | ||
| 'start_timeout_sec and test_timeout_sec combined.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new timeout of 2700 seconds might not be sufficient. According to the PR description, this change is to allow a second test attempt to succeed. With test_timeout_sec at 1800 seconds, two attempts could take up to 3600 seconds, plus any startup time (start_timeout_sec). The current value of 2700s doesn't cover this worst-case scenario and may lead to timeouts in the future. The job timeout should be set to a value that can reliably accommodate all attempts, for instance by following a formula like (number_of_attempts * test_timeout_sec) + start_timeout_sec.
| type=str, | ||
| default='2100', | ||
| default='2700', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a numeric timeout value, it's better practice to use type=int instead of type=str. This ensures the value is parsed correctly as an integer at the command-line level, preventing potential bugs and simplifying the code that uses this value, as it won't need to perform its own type conversion.
| type=str, | |
| default='2100', | |
| default='2700', | |
| type=int, | |
| default=2700, |
Unit test jobs in CI occasionally time out, even when tests
are configured to retry. The existing job timeout of 2100
seconds does not provide sufficient time for a second attempt
to complete after an initial test timeout of 1800 seconds.
Increase the default job timeout to 2700 seconds to give a
subsequent test attempts a fighting chance.
Issue: 470590965