-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Implemented Graphql requests for Github PR's, Issues and comments scanning #4431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implemented Graphql requests for Github PR's, Issues and comments scanning #4431
Conversation
5890953 to
e6fff98
Compare
451a1aa to
0b738a4
Compare
274a842 to
9b9761e
Compare
6c487cb to
c3a4de9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially skeptical of putting the new code in graphql.go, but I think I see why you did it and I can't think of a better option.
However, I did have some trouble following the rate limiting code - in particular, how it interacts with the rest of the GitHub rate limiting code. Specifically, it's doing a lot of its own stuff before it begins to use the rest of the rate limiting code, and I don't understand why it needs to do that beforehand. (Also, mentioned inline, 5 minutes is a huge interval, and Prometheus gets skipped in many cases.)
Can you try another pass over the rate limiting code to see if things can be clarified a bit?
pkg/sources/github/graphql.go
Outdated
| ctx.Logger().Info("GraphQL RATE_LIMITED error (fallback)", | ||
| "retry_after", retryAfter.String()) | ||
| time.Sleep(retryAfter) | ||
| return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This early return (and all of the later ones) happen before we report rate limiting to Prometheus, which means that we won't have much visibility when it happens. Relatedly, I find it a bit hard to follow this new rate limiting code - I suspect the difficulty was with integrating it with the existing rate limiting code. (For example: Why does this initial request have a 3x 5minute retry step before getting into the global rate limiting code?) You might be able to resolve both issues at once with another refactoring pass at this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I overlooked the reporting to Prometheus. Thanks for pointing out!
The reason I initially set the request interval to 5 minutes was because the GraphQL API sometimes returns a rate limit error before the remaining value reaches 0. When this happens, the response includes no rate limit info, so we can't rely on the usual rate limit object and the resetAt value defaults to zero, which can cause incorrect handling.
To deal with this, I first handled the error case directly, then move on to process the rate limit if available.
Your comment gave me a new perspective, so I adjusted the approach a bit and now:
-
We first check if the global rate limit is already set.
- If it's set, we use that value and delay accordingly.
-
If it's not set, and we receive an error:
- We check if it's a rate limit issue.
- In that case, we set the global rate limit (so other requests know to wait) and continue reporting to Prometheus.
-
If there's no error, we check the actual rate limit object:
- If
remaining < 3, we wait until theresetAttime. - Why 3? Because some GraphQL queries can cost more than one point, depending on complexity.
- If
pkg/sources/github/graphql.go
Outdated
| func (s *Source) handleGraphQLRateLimit(ctx context.Context, rl *rateLimit, errIn error, reporters ...errorReporter) bool { | ||
| // if rate limit exceeded error happened, wait for 5 minute before trying again | ||
| if errIn != nil && strings.Contains(errIn.Error(), "rate limit exceeded") { | ||
| retryAfter := 5 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge retry interval. Does GitHub prescribe it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered in the above comment.
24cd3cc to
f6445af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks good, just had a couple notes. We should add tests for this--probably the integration tests should cover this (maybe a unit test or two for chunkIDs), so if we just run the integration tests w/ UseGithubGraphqlAPI both false and true, that'd give us even more confidence we're not breaking anything.
main.go
Outdated
| feature.GitlabProjectsPerPage.Store(100) | ||
|
|
||
| // OSS Default using github graphql api for issues, pr's and comments | ||
| feature.UseGithubGraphqlAPI.Store(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's start this off as false so it's not automatically turned on for EE customers (we can/should make it true for OSS after we get the flag in EE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, regretfully I think the initialism is GraphQL--if we're gonna do it for REST we should probably do it here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll temporarily turn it off. I didn't quite understand your second comment - just to clarify, we don’t use any initialism for REST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I was just saying the "QL" is capitalized; Graphql -> GraphQL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/sources/github/github.go
Outdated
| ) | ||
|
|
||
| func (s *Source) processRepoComments(ctx context.Context, repoInfo repoInfo, reporter sources.ChunkReporter, cutoffTime *time.Time) error { | ||
| func (s *Source) processIssueandPRsWithCommentsREST(ctx context.Context, repoInfo repoInfo, reporter sources.ChunkReporter, cutoffTime *time.Time) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm an 80 columns person because I have a bunch of buffers open side-by-side, but I accept I'm a relic from an earlier time. This one (and at least one more below) are > 150 though--can we wrap somewhere between 90 and 120?
| return s.processRepoComments(ctx, repoInfo, reporter, cutoffTime) | ||
| // if we need to use graphql api for repo issues, prs and comments | ||
| if feature.UseGithubGraphqlAPI.Load() { | ||
| return s.processRepoIssueandPRsWithCommentsGraphql(ctx, repoInfo, reporter, cutoffTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(No need to do anything here, just musing)
Hrm, w/ the old code passing cutoffTime just the once was kind of 🤷🏻 , but now that we're carrying it around everywhere, it makes me think it'd be nice if we processed s.commentsTimeframeDays up top in Init, that way we don't need to drill it down everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sound's good we can do it in a separate optimization PR.
| return err | ||
| } | ||
|
|
||
| // if a pull request has more than 100 comments - some interns pull request might endup here :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
I added the test but even the existing integration tests are not working for me 😿 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments about rate limiting, and a strong opinion about tabular tests. (I've been drawing folks attention to testify's testsuite functionality to serve similar purposes but be more maintainable).
In general I think PRs this size are too large to effectively review - nothing stands out to me as a problem though.
pkg/sources/github/graphql.go
Outdated
| // request this issue more comments | ||
| var commentsQuery singleIssueComments | ||
| err := s.connector.GraphQLClient().Query(ctx, &commentsQuery, commentVars) | ||
| if s.handleGraphqlRateLimitWithChunkReporter(ctx, reporter, &query.RateLimit, err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if s.handleGraphqlRateLimitWithChunkReporter(ctx, reporter, &query.RateLimit, err) { | |
| if s.handleGraphqlRateLimitWithChunkReporter(ctx, reporter, &commentsQuery.RateLimit, err) { |
Not sure if this is what you want or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👀
pkg/sources/github/graphql.go
Outdated
| } | ||
|
|
||
| err := s.connector.GraphQLClient().Query(ctx, &commentQuery, singlePRVars) | ||
| if s.handleGraphqlRateLimitWithChunkReporter(ctx, reporter, &query.RateLimit, err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if s.handleGraphqlRateLimitWithChunkReporter(ctx, reporter, &query.RateLimit, err) { | |
| if s.handleGraphqlRateLimitWithChunkReporter(ctx, reporter, &commentQuery.RateLimit, err) { |
Similar to another comment I made. It's not obvious to me from looking here which rate limit we should be respecting. Some inline comments might help.
If we aren't interested in respecting separate rate limits for each query object separately, I don't understand why the rate limit object is attached to the query object. (maybe out of scope for this?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to make sure we're respecting the rate limits for single-query objects, like the comments on a single PR. I mistakenly used the wrong rate limit in this case thanks for catching that. While it's rare to hit the rate limit on these individual queries, it's still important to get it right.
pkg/sources/github/graphql.go
Outdated
| commentsPagination: (*githubv4.String)(nil), | ||
| } | ||
|
|
||
| if err := s.connector.GraphQLClient().Query(ctx, &query, reviewThreadVars); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be rate limited as well?
pkg/sources/github/graphql.go
Outdated
| } | ||
|
|
||
| var retryAfter time.Duration | ||
| // if rate limit exceeded error happened, wait for 5 minute before trying again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is out of date now.
| numExpectedChunks int | ||
| }{ | ||
| { | ||
| name: "unauthenticated, single repo, issue comments and pr comments", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid using tabular tests like this. They end up being hard to understand and even harder to maintain. Breaking these out into individual test functions (or even a new test suite) will make our lives easier in the future. imo, it's just as easy to copypasta a test function as it is to copypasta an item in a list of tests, but it's harder to manage as code paths diverge when they're not broken out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The github_integration_test.go file uses tabular test cases throughout. If we want to update the test case structure, we should apply the change consistently across all test cases. It might be better to create a separate ticket or PR for that cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
b9ce711 to
615cc9f
Compare
56f0ebb to
8a5b356
Compare
Description:
This Pull request replaces the HTTP API requests for Github source PR's, Issues and their comments with GraphQL requests.
Scan Results with no verification:
Note: While results may vary slightly between runs, GraphQL consistently outperforms REST in scan duration.
Checklist:
make test-community)?make lintthis requires golangci-lint)?