Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict fetching PRs and issues only to time range needed #103

Merged
merged 1 commit into from
Sep 12, 2023

Conversation

wagoodman
Copy link
Contributor

Today we fetch all issues and PRs without curtailing the search based on known release information. This PR adjusts this behavior to consider timestamps from the last release, terminating PR and issue fetching when we've reached a non-candidate (with all requests ordered by the "last updated" field).

The performance gain depends on the repo in question, since it's sensitive to the total number of PRs and issues in that repo. In the syft repo as of this writing this cuts down the time to run chronicle from 20 seconds to 2 seconds 🎉 .

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
@wagoodman wagoodman added the enhancement New feature or request label Sep 12, 2023
@wagoodman wagoodman requested a review from a team September 12, 2023 12:39
@@ -317,14 +322,28 @@ func fetchMergedPRs(user, repo string) ([]ghPullRequest, error) {
}

// var limit rateLimit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be commented out? (I see it's not part of this diff, so not a blocking comment; just surprised.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is a partial implementation of #9

// limit = query.RateLimit

for _, prEdge := range query.Repository.PullRequests.Edges {
saw++
process, terminate = checkSearchTermination(since, &prEdge.Node.UpdatedAt, &prEdge.Node.MergedAt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand the logic here. What I think is happening:

  1. We've asked GH for relevant PRs, with more recently updated PRs first.
  2. For each PR, we check two things: Should we process the current PR (process) and should we break off running additional queries (terminate).
  3. If an PR was merged before the last release, we don't process it (because it was presumably part of the last release).
  4. If a PR was last updated before the last release, we break off searching, because we're ordering GH events by last update, and we got to one before the ones we care about.

Is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup yup -- right on. The go case is that a PR isn't updated after it's merged, so merge-timestamp == update-timestamp, but it is possible to update a PR that may or may not be part of this release after it's been merged (e.g. adding a label to the issue, adding a comment, etc)... we want to curtail the search to save on time but not include PRs that were merged before the last release.

@wagoodman wagoodman merged commit 693a5f9 into main Sep 12, 2023
@wagoodman wagoodman deleted the fetch-only-what-you-need branch September 12, 2023 13:25
@wagoodman wagoodman added bug Something isn't working and removed enhancement New feature or request labels Sep 13, 2023
@wagoodman
Copy link
Contributor Author

swapped the label with the following reasoning: though this is an enhancement, it should have never really been querying all issues / PRs to begin with, so it's more of a performance fix

@wagoodman wagoodman self-assigned this Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants