-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add mixed-precision and agc to gradaccum optimizer #131
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dPys Great job! Looks very interestin!
I have enabled such that you can run CIs for this PR. Note that linting fails straight away. Please, fix that, and run tests to see if the proposed modifications are compatible with the unit/integration tests.
@dPys Do you have time to address my concerns, such that tests can be performed? I cannot add new features/changes/PRs before tests are passing. |
Hi @andreped , yes -- currently very sick / out-of-commission, but I should be able to get to this in the coming days. Stay tuned! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that tests seem to be failing:
https://github.com/andreped/GradientAccumulator/actions/runs/7779143998/job/21215129854?pr=131
You should run these locally and see if you can resolve them there first. Easiest way to run these tests is using pytest, see here:
https://github.com/andreped/GradientAccumulator/blob/main/.github/workflows/test.yml#L92
Tests should now be passing (or at least very close) :-) |
I recommend running the tests listed here locally and verifying that these run locally:
A good idea would be to build the wheel in the original virtual env you are working in locally, then installing it in a new virtual env, before running all new tests. That's a nice way of verifying that the wheel works as it should. I have enabled such that CIs can be ran now, without requiring that you have made a merged PR previously. So now, running new tests should be possible for future commits in this PR. EDIT: I also see that you have modified the |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #131 +/- ##
==========================================
+ Coverage 97.58% 98.00% +0.41%
==========================================
Files 5 5
Lines 248 350 +102
==========================================
+ Hits 242 343 +101
- Misses 6 7 +1 ☔ View full report in Codecov by Sentry. |
Whoops, I had accidentally pushed the wrong commit before to trigger the build. Looks like that has now been corrected. As for the |
happy to add further unit tests to cover the remaining uncovered lines of the new optimizer class following code review |
You are free to make that now, or do that in a new PR when this is merged. But let's see if all tests pass first. Looks very promising :] Great work!!! |
All tests seem to succeed. I see that you have removed a lot of the documentations on various methods that we use for autodoc -> to build documentations. We need to follow this standard, at least to verify that the generation of documentations still work and show what we want the user to see. You can see here for how I build the docs, and use that when adding documentations to the different classes and methods:
Lastly, yes, the codecov seemed to have take a massive hit. We need to up the unit tests again, to ensure that our code has been properly tested. This does not need to be perfect, but we should verify that as many of the methods and components at least don't crash at first usage. Feel free to ask questions when you make an attempt :] |
Absolutely. I'm very used to a high standard for code coverage, both in and out of open-source projects, so can very much appreciate this aspect. I've essentially refactored the majority of the optimizer, so the reduction in code coverage comes as no surprise. Provided that you are on-board with all of the proposed changes in functionality and code organization, then I'm certainly happy to put in the time to get the test code coverage to where it needs to be :-) |
It would be better for me to have everything ready and at a similar state as the original code, before I do my full review, but the solution looks fine for now, as all tests seem to pass. So feel free to get code coverage to the same or better than than it was :] Looking forward to testing this! Will have to run some benchmarks to see if there are any performances differences, as well as test if it still works in a distributed setting (to some extent). But that I will do after I have done my full review. |
Excited to be working on this with you! |
As for the distributed setting, I've tested it for tensorflow 2.0-2.10, but after 2.10 when the optimizer API changed, my guess is that |
Nah, no need to address distributed support in this PR. That is outside the scope. This is also outside the scope of this PR. Lets just get this PR through :] Regarding distributed support, see issue #132, where we aim to find a solution for this. We can make a new PR if we ever get a solution for this, but I have yet to see a solution that does what I expect it to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks technically sound, but lets get the code cov to a similar high level as it was before this PR, and then I can do a more comprehensive review and thorough testing :]
I see that you are making great progress on code coverage :] I guess you are aware of it, but if you want to see exactly what remains to be covered, you can see the current report here: Feel free to tag me in when you feel like it is good enough :] I noticed now that I had some "no-pragma" statements, which you removed, which naturally would degrade the coverage (as I just ignored some altogether). But I had some because some of the methods were not that critical, but then again having coverage on these as well would be better. |
Done |
Great work, @dPys!! I will have a look tomorrow :] Looking forward to it! |
Sorry for the really late reply, @dPys! I have been quite preoccupied lately with work. I can review this PR after work today :] |
Hi @andreped ! Just wanted to follow-up to see whether the clouds have parted for you yet with work? eager to hear any final thoughts you might have on this contribution, and really just hopeful to see it merged :-) |
lr
)@dPys