-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict fetching PRs and issues only to time range needed #103
Conversation
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@@ -317,14 +322,28 @@ func fetchMergedPRs(user, repo string) ([]ghPullRequest, error) { | |||
} | |||
|
|||
// var limit rateLimit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be commented out? (I see it's not part of this diff, so not a blocking comment; just surprised.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is a partial implementation of #9
// limit = query.RateLimit | ||
|
||
for _, prEdge := range query.Repository.PullRequests.Edges { | ||
saw++ | ||
process, terminate = checkSearchTermination(since, &prEdge.Node.UpdatedAt, &prEdge.Node.MergedAt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand the logic here. What I think is happening:
- We've asked GH for relevant PRs, with more recently updated PRs first.
- For each PR, we check two things: Should we process the current PR (
process
) and should we break off running additional queries (terminate
). - If an PR was merged before the last release, we don't process it (because it was presumably part of the last release).
- If a PR was last updated before the last release, we break off searching, because we're ordering GH events by last update, and we got to one before the ones we care about.
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup yup -- right on. The go case is that a PR isn't updated after it's merged, so merge-timestamp == update-timestamp
, but it is possible to update a PR that may or may not be part of this release after it's been merged (e.g. adding a label to the issue, adding a comment, etc)... we want to curtail the search to save on time but not include PRs that were merged before the last release.
swapped the label with the following reasoning: though this is an enhancement, it should have never really been querying all issues / PRs to begin with, so it's more of a performance fix |
Today we fetch all issues and PRs without curtailing the search based on known release information. This PR adjusts this behavior to consider timestamps from the last release, terminating PR and issue fetching when we've reached a non-candidate (with all requests ordered by the "last updated" field).
The performance gain depends on the repo in question, since it's sensitive to the total number of PRs and issues in that repo. In the syft repo as of this writing this cuts down the time to run chronicle from 20 seconds to 2 seconds 🎉 .