-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better Link Copybara-generated PRs to the Googler who authored them #448
Comments
It's not possible to have the PR marked as created by the original author, as that would require the bot to be able to have each user's credentials, but we can indeed make this better. In documentation for procedures for the separate XLA repo, we are calling out the requirement to list yourself in the authorship mapping. So that gets us the commit in the PR marked with the correct author, at least. I may be able to make this a hard failure if the mapping doesn't work, although there will still have to be some way to bypass that, as XLA will be required to accept changes for large migrations across the monorepo. I will investigate. I think Copybara does not currently expose the GitHub username, which would allow us to |
If I remember correctly by a comment in an old ticket not all the authors have a GitHub account (or it was just not mapped?). |
Yes not all Google authors will necessarily have a GitHub account. That should only be the case for people outside the team, e.g. the large-scale cleanups I mentioned above. We can't "fix" that in that I'm not in a position to mandate that all Google engineers create a GitHub account :-) We might be able to tag the reviewer in the case where the author doesn't have an account though, and perhaps even fall back finally to a default person to debug the issue. My understanding is that the current behavior where the commit author is set isn't sufficient because:
|
Unfortunately, this is a recurrent and general problem visible in many non Github first opensource projects. At least it seems different the case where authors doesn't have a Github and when they have not mapped one.
Yes these are my same points and quite the same issues with default copybara managed repos. Until we can all stay on the same page on a more Github friendly development, a Googler with a github account must act as a fallback between the external opensource world and the internal one before push on "anon" PRs. |
Yup makes sense to me. I agree that projects that don't have a GitHub SoT are less friendly to open development. I generally advocate for moving to a GitHub SoT for that reason (I work on the IREE project), but in cases where it's not feasible/resourced we can at least make the experience much better than it is today. The Copybara team has responded to my feature request to expose the GitHub username though I don't have an idea on timeline yet. We may actually be able to do better than just I know there have been a number of issues with Copybara/google3 source-of-truth ergonomics in the past and I'm sure you've filed tickets for them, but would you mind linking them here or in a new issue in this repo? My current list off the top of my head:
|
For Googlers, Copybara FR is http://b/267547747 |
The tickets are sparse and distributed over many years and repos. Just something that I have in mind: Another point is that the code naturally accumulate over time many inline TODOs in internal only |
@GMNGeoffrey Another general point is that the development pipeline and dev-env misalignment it is going to general impact also inclusivity metrics. Over time dev could care more about the internal system than the external one. One metric is: But we are generally going to impact many more interesting metrics (e.g. occasional contributors, elephant factor, etc..): |
Thanks for your feedback @bhack. So we don't lose the particular feedback/discussion on improving author attribution, let's keep this issue on that topic. |
Nothing new honestly, but I hope we can do more than just the author attribution. @GMNGeoffrey has done a great work on managing IREE on Github (now migrating under OpenXLA) to put the community on the same page as internal devs. |
I'm glad that our efforts on IREE have been appreciated. We have put quite some thought into it. We had the advantage of planning for OSS from even before we released on GitHub. XLA has much more history and stuff depending on it right now, so we're more constrained, but we are working to improve the situation. I agree with @theadactyl that it's better to keep this issue on topic, so that we can track the work required (sorry I asked tangential questions). If there are other specific problems you see, please do file issues. I'll create separate issues out of your comments here. |
FYI the Copybara team implemented some of the blocking functionality for this in google/copybara@95aff19. There's still some blockers, but we're thinking this should be functioning soon. @ddunl will be handling actually adding this to our config |
Thanks for the update. |
This should be working now. Example here: #2532 . I'm not sure there will be a commit to refer to (modification only touches internal files), but I'll update this comment if there is. It looks like we'll need to reach out to the copybara team again to be able to assign reviewers, so I'll leave this open for now |
But what happens where the author is the anon tensorflower-gardener? Like in #2527 |
There isn't much we can do in that case - I'm planning on making a wider push to ensure all regular contributors to XLA have their github accounts registered so that they won't show up as anonymous, but that PR in particular is from someone who comes from a completely different team and has never committed to XLA in the past. Cases like those we could only solve by mandating that every Googler has a Github account, which is out of my control |
Yes I suppose that this will really require to scale the problem up to a general policy when contributing CL to published OSS project. But it was never resolved in many years so I don't think we could solve this just for OpenXLA. Is the gardner still the top contributor in this repo? |
That's a good point - I'm actually a bit surprised that the gardener is still so high. Part of that is due to LLVM integrates, but there are still many more commits than I would've thought that are anonymous, including some authored by people I can reach out to directly. |
As this is a global issue cause historically we have this problem in many Google opensoruce projects and we don't have a global policy and the related technical tool to pre-check/enforce that a published CL authors have a mapped GitHub Account I think that best practical and independent solution is to adopt and maintain a good GitHub code ownership configured for the repo. In this way we are at least always sure that there is an non-anonymous reviewer/proxy for "anon PR" whit which normal GitHub users can interact. |
This is a good point - we will probably avoid using GitHub's codeowners, but something along these lines would make sense. Some document which records who is a good person to tag for different parts of the codebase. (We'll probably avoid codeowners because everyone would get assigned to the Copybara generated PRs that originate internally, so there'd be a ton of spam). |
Why everyone? I think only the specific component/subfolder owner/proxy. We can just exclude specific folders/file patterns related to recurrent commits (e.g. llvm updates) |
One of the issues with GitHub's codeowners system is that everyone who is a code owner for every path in the PR will get assigned. That's pretty noisy. We use it in IREE to make sure things don't fall through the cracks, but I definitely get a lot of unnecessary email. It's even worse if you want to use it to gate submission (code owners must approve) because who can approved is tied to who must email. This makes it difficult to deal with ownership at different scopes. Like if I have general responsibility over everything in directory I think David's concern is with emails when Copybara generates a PR from an internal CL. The reviewer will already know about the change and so the additional email will just be spam and would likely lead to people ignoring GitHub emails more broadly. I think we will actually have the same issue with setting reviewers to the internal reviewers. It should be relatively easy for people to only filter these emails though. They should look like:
and cc'ed to review_requested@noreply.github.com |
It seems you don't like the default service that GitHub is offering with the current features/granularity. Just to make an example but there are others to evaluate: Also I think that as a plain alternative being subscribed to all the PRs cause we can potentially have comment to "anon" CL in PR stage it is not going to create less traffic for the owners as everybody need to be subscribed to everything or at least we are going directly into the triage loop that it is not the best solution in fast merging CL originated PRs. I could be wrong but it seems that the point is that as the CL was already reviewed/approved internally we don't care too much about it in "Gtibub world". But I think it is always an opportunity to involve people on GitHub that were potentially excluded in the CL (private) lifecycle phase. |
Yes, this is why David said that we might not use the GitHub CODEOWNERS mechanism, specifically |
Do we have an edge case with this one #2656? |
Hmm thanks for pointing that out. I see an error in the logs, but I don't know what's different about this CL/PR compared to the others. @ddunl can you investigate? I'll link you the logs |
I cannot see the log but probably are the force pushed anon. The authored commit is in the middle. |
Hmm looks like it's not a one off. This one has a problem too: #2749 |
Apologies for not responding to this earlier at @bhack, when you originally commented about this I looked into it briefly but couldn't find anything. It seems like this is due to the Googlers authoring the PRs not being in the openxla org and Github (for some reason) won't let them be assigned for that reason. (Even though it seems like I can assign them manually). We've already sent invites to the whole team so I guess I'll need to make sure everyone knows they got an email that will let them join! |
Thanks @GMNGeoffrey for you great OSS aligment effort. Goodluck for your next project. |
Discussed in #63
Originally posted by bhack December 8, 2022
We are still having the same well known issues with copy-bara bot PRs.
See our full thread with @qlzh727 at Keras keras-team/keras#17183 (comment)
The text was updated successfully, but these errors were encountered: