-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kernel-based matrix cumsum #416
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #416 +/- ##
==========================================
+ Coverage 88.22% 89.19% +0.96%
==========================================
Files 49 49
Lines 2811 2803 -8
==========================================
+ Hits 2480 2500 +20
+ Misses 331 303 -28
Flags with carried forward coverage won't be shown. Click here to find out more.
|
Niiice! the first kernel!! 😄 🎊 How is the performance of KA vs the regular cumsum? Feel free to merge if it passes all the tests. If you get a codecov error, there are some low-hanging fruit tests like |
I added some more tests to increase code coverage. Although it is still not reaching the target, a lot of the remaining uncovered lines are for custom GPU logic for the MotionModel structs which is being eliminated in #408 . Once this is done, I think it should reach the goal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few comments that could increase the codecov further If it's not enough.
@@ -0,0 +1,48 @@ | |||
using KernelAbstractions: @index, @kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if the newly added CPU backend adds codecov for the kernel?
Also, add the comments to exclude the coverage for the metal extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it does since on the CPU it forwards the call to cumtrapz in KomaMRIBase. I forgot about code coverage not working for Metal, so I added the comments to exclude.
KomaMRICore/test/runtests.jl
Outdated
x = ones(Float32, 1000) | ||
if USE_GPU | ||
x = x |> gpu | ||
@test KA.get_backend(x) isa KA.GPU | ||
else | ||
@test true | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good to check that
Use_gpu = true
- cpu(gpu(x)) is a CPU array
Use_gpu = false
- cpu and gpu are no-ops
I added a kernel implementation of cumsum(A::Matrix, dims=2). Since Metal and oneAPI don't yet support cumsum and findall, we are working around the issue for now by copying to the CPU, doing the operations there and then copying back. It looks like the 2D cumsum in the cumtrapz function of KomaMRIBase is the main place where this could be significantly slower than an actual GPU implementation (the other cumsum and findall operations are on 1D arrays), so I wrote a simple kernel equivalent to avoid having to copy to the CPU.