Add kernel-based matrix cumsum #416

rkierulf · 2024-06-21T02:35:13Z

I added a kernel implementation of cumsum(A::Matrix, dims=2). Since Metal and oneAPI don't yet support cumsum and findall, we are working around the issue for now by copying to the CPU, doing the operations there and then copying back. It looks like the 2D cumsum in the cumtrapz function of KomaMRIBase is the main place where this could be significantly slower than an actual GPU implementation (the other cumsum and findall operations are on 1D arrays), so I wrote a simple kernel equivalent to avoid having to copy to the CPU.

codecov · 2024-06-21T02:41:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.19%. Comparing base (3178902) to head (6a69454).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #416      +/-   ##
==========================================
+ Coverage   88.22%   89.19%   +0.96%     
==========================================
  Files          49       49              
  Lines        2811     2803       -8     
==========================================
+ Hits         2480     2500      +20     
+ Misses        331      303      -28

Flag	Coverage Δ
base	`86.42% <ø> (ø)`
core	`85.68% <100.00%> (+6.28%)`	⬆️
files	`93.70% <ø> (ø)`
komamri	`93.98% <ø> (ø)`
plots	`89.27% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
KomaMRICore/ext/KomaAMDGPUExt.jl	`66.66% <ø> (+24.99%)`	⬆️
.../src/simulation/Bloch/BlochDictSimulationMethod.jl	`87.50% <100.00%> (ø)`
...Core/src/simulation/Bloch/BlochSimulationMethod.jl	`100.00% <100.00%> (ø)`
...omaMRICore/src/simulation/Bloch/KernelFunctions.jl	`100.00% <100.00%> (ø)`
KomaMRICore/src/simulation/Functors.jl	`47.36% <100.00%> (+14.93%)`	⬆️
KomaMRICore/src/simulation/SimulatorCore.jl	`94.59% <100.00%> (ø)`

... and 3 files with indirect coverage changes

cncastillo · 2024-06-21T03:20:51Z

Niiice! the first kernel!! 😄 🎊 How is the performance of KA vs the regular cumsum? Feel free to merge if it passes all the tests.

If you get a codecov error, there are some low-hanging fruit tests like gpu and print_devices that you can add quite easily (verifying that gpu changes an array to the backend's array, and that the print_devices does not fail @test true). If not, I can merge it tomorrow.

…to cumsum

rkierulf · 2024-06-21T18:44:01Z

I added some more tests to increase code coverage. Although it is still not reaching the target, a lot of the remaining uncovered lines are for custom GPU logic for the MotionModel structs which is being eliminated in #408 . Once this is done, I think it should reach the goal.

cncastillo

I added a few comments that could increase the codecov further If it's not enough.

cncastillo · 2024-06-21T18:38:29Z

KomaMRICore/src/simulation/Bloch/KernelFunctions.jl

@@ -0,0 +1,48 @@
+using KernelAbstractions: @index, @kernel


Can you check if the newly added CPU backend adds codecov for the kernel?

Also, add the comments to exclude the coverage for the metal extension.

I don't think it does since on the CPU it forwards the call to cumtrapz in KomaMRIBase. I forgot about code coverage not working for Metal, so I added the comments to exclude.

cncastillo · 2024-06-21T18:49:15Z

KomaMRICore/test/runtests.jl

+    x = ones(Float32, 1000)
+    if USE_GPU
+        x = x |> gpu
+        @test KA.get_backend(x) isa KA.GPU
+    else
+        @test true
+    end


I think it would be good to check that

Use_gpu = true

cpu(gpu(x)) is a CPU array

Use_gpu = false

cpu and gpu are no-ops

…ormatting

Add kernel-based matrix cumsum

987aef1

rkierulf added 3 commits June 21, 2024 12:50

Fix some minor issues

64cccd1

Add tests to increase code coverage

786cbf9

Merge branch 'master' of https://github.com/JuliaHealth/KomaMRI.jl in…

fad789d

…to cumsum

rkierulf force-pushed the cumsum branch from 4a4f314 to fad789d Compare June 21, 2024 18:20

rkierulf requested a review from cncastillo as a code owner June 21, 2024 18:20

Fix print devices for AMD

15ec737

cncastillo approved these changes Jun 21, 2024

View reviewed changes

Expand tests and exclude Metal and kernel functions from coverage / f…

6a69454

…ormatting

rkierulf merged commit d624e8f into master Jun 21, 2024
18 checks passed

rkierulf deleted the cumsum branch June 21, 2024 19:48

rkierulf mentioned this pull request Aug 23, 2024

GSOC: Add GPU Explanation Section to Documentation #470

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kernel-based matrix cumsum #416

Add kernel-based matrix cumsum #416

rkierulf commented Jun 21, 2024

codecov bot commented Jun 21, 2024 •

edited

Loading

cncastillo commented Jun 21, 2024 •

edited

Loading

rkierulf commented Jun 21, 2024

cncastillo left a comment

cncastillo Jun 21, 2024 •

edited

Loading

rkierulf Jun 21, 2024

cncastillo Jun 21, 2024 •

edited

Loading

Add kernel-based matrix cumsum #416

Add kernel-based matrix cumsum #416

Conversation

rkierulf commented Jun 21, 2024

codecov bot commented Jun 21, 2024 • edited Loading

Codecov Report

cncastillo commented Jun 21, 2024 • edited Loading

rkierulf commented Jun 21, 2024

cncastillo left a comment

Choose a reason for hiding this comment

cncastillo Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

rkierulf Jun 21, 2024

Choose a reason for hiding this comment

cncastillo Jun 21, 2024 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Jun 21, 2024 •

edited

Loading

cncastillo commented Jun 21, 2024 •

edited

Loading

cncastillo Jun 21, 2024 •

edited

Loading

cncastillo Jun 21, 2024 •

edited

Loading