Update Halo class to allow halo exchanges for device arrays #163

brian-oneill · 2024-11-18T02:28:25Z

This PR updates the Halo to allow for halo exchanges of arrays allocated in device memory space as well as host memory space. With this update, Omega can take advantage of GPU aware MPI implementations.

Changes include:

Replacing SendBuffer, RecvBuffer and certain other member vectors of Neighbor and ExchList classes with arrays on both host and device
Replace packBuffer and unpackBuffer overloaded functions with specialized function templates for supported array ranks
Removing deepCopy before and after halo exchanges in classes and unit tests where no longer necessary
Add device array tests to Halo unit test
Add OMEGA_MPI_ON_DEVICE build flag (on by default)

Successfully built and passed unit tests with OMEGA_MPI_ON_DEVICE both on and off on Chrysalis (intel), Perlmutter CPU (intel) & GPU (nvidiagpu), and Frontier CPU (crayclang) & GPU (crayclanggpu)

Checklist

Documentation:
- User's Guide has been updated
- Developer's Guide has been updated
- Documentation has been built locally and changes look as expected
Testing
- Unit tests have passed. Please provide a relevant CDash build entry for verification.

Replace packBuffer overloaded functions with specialized function templates for supported Kokkos array ranks. Allow for buffer packing of device arrays as well as host arrays.

Replace unpackBuffer overloaded functions with specialized function templates for supported Kokkos array ranks. Allow for buffer unpacking into device arrays as well as host arrays.

Allow for MPI communication of buffer arrays on either host or device

Remove device->host and host->device deep copies no longer needed for halo exchanges

Remove unnecessary deep copies and add Halo destructor

…change

Add OMEGA_MPI_ON_DEVICE build flag and machine specific build configs for Frontier and Perlmutter

rljacob · 2024-11-18T05:34:21Z

Curious why you test Perlmutter GPU with nvidia instead of gnu (which is what E3SM and SCREAM test with).

grnydawn · 2024-11-18T15:51:36Z

Curious why you test Perlmutter GPU with nvidia instead of gnu (which is what E3SM and SCREAM test with).

@rljacob , I think we want Omega to support both NVIDIA and GNU compilers on Perlmutter GPU nodes. Since Perlmutter uses NVIDIA GPUs, I think that we typically test Omega with the NVIDIA compiler first and the GNU compiler next. However, if E3SM and SCREAM consider GNU as the primary compiler on Perlmutter, we may also adopt the same compiler preference.

rljacob · 2024-11-18T19:46:53Z

We have not actually done a performance comparison between nvidia, gnu and intel on perlmutter gpus. But we have seen nvidia have trouble with some of the Fortran code. gnu is preferred unless there is evidence another one is better.

mark-petersen · 2024-11-21T19:03:49Z

Confirmed that PR passes unit tests on Frontier with cpu and gpu. Since the unit tests show this is working correctly because they create arrays with unique values per cell, do a halo exchange, and then compute the error.

Awaiting timing tests from Kieran Ringel for performance comparison between this new halo exchange on device and the previous halo exchange on host (gpu versus cpu).

grnydawn

Please see my comment on each items

components/omega/OmegaBuild.cmake

components/omega/src/base/Halo.h

mwarusz

I found one correctness bug and I think some generic Kokkos utilities could be moved to OmegaKokkos.h but the changes in general look good to me. I performed some initial scaling experiments on Frontier and Perlmutter. In general, the performance improved, although the results on Frontier for large gpu counts are a bit surprising (the original code is slightly faster). I also wasn't able to get GPU-aware MPI to work on Perlmutter.

components/omega/src/ocn/OceanState.cpp

components/omega/src/base/Halo.h

components/omega/src/base/Halo.cpp

mwarusz · 2024-11-25T18:01:23Z

Timing results for this PR with OMEGA_MPI_ON_DEVICE turned on and off (indicted in second half of name in the legend)

@kieran-ringel Note that depending on what exactly you measured this bug #163 (comment) might have affected these results, since it causes state and tracer halo exchanges to exchange host arrays only.

kieran-ringel · 2024-11-25T20:12:05Z

Updating timing with updated exchange device arrays in State and Tracer exchangeHalo functions

(note, x axis is actually number of nodes, not GPUs)

Unify similar functions and enums from Halo.h and Field.h and move them to OmegaKokkos.h, along with ArrayRank struct.

components/omega/OmegaBuild.cmake

Remove system-specific check for disabling cudaMallocAsync in Kokkos build. Instead set Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF by default for OMEGA_ARCH=CUDA, and add ability to enable via OMEGA_CUDA_MALLOC_ASYNC command line CMake option.

Remove system-specific check, and instead append `export MPICH_GPU_SUPPORT_ENABLED=1` to omega_env.sh whenever Omega is built with MPICH, and both OMEGA_TARGET_DEVICE and OMEGA_MPI_ON_DEVICE are true

grnydawn

Approved. Tests passed on Frontier, Perlmutter, and Chrysalis using various compilers. The only exception is a segmentation fault failure in TIMESTEPPING_TEST on Perlmutter-GPU with the gnugpu compiler. This issue appears to be isolated to this specific case and could be addressed in a separate task.

brian-oneill · 2024-12-03T23:03:23Z

@grnydawn I had ran the tests successfully with gnugpu after the discussion above and didn't see any issue, but I built in debug mode. I'm seeing this seg fault now building in release. I'll take a look and see if I can track it down.

kieran-ringel · 2024-12-03T23:17:00Z

I ran the same performance tests as above on Perlmutter Approving PR based on performance results on these 2 machines

grnydawn · 2024-12-04T15:19:19Z

@brian-oneill , It is strange that I encountered a failure while running Omega built in "Debug" mode. However, when I retried the TIMESTEPPER_TEST this morning, I did not experience any failures in either "Debug" or "Release" mode.

FYI, yesterday, I was able to locate the source lines where the failure occurs. It segfaulted in Halo.h near line 850.

         parallelFor(
             {NK, NTotList, NJ}, KOKKOS_LAMBDA(int K, int IExch, int J) {
                auto Val       = Array(K, LocIndex(IExch), J);
                const R8 RVal  = reinterpret_cast<R8 &>(Val);
                const I4 IBuff = (K * NTotList + IExch) * NJ + J;
                LocBuff(IBuff) = RVal;
             });

mark-petersen

Thank you @brian-oneill! Approving based on testing above, particularly the plot of Halo exchange speed-up time by @kieran-ringel. The timing results on perlmutter are confusing, but I think that is due to the architecture and not an issue with this PR.

mark-petersen · 2024-12-05T18:35:27Z

Merging. We will keep an eye out for the failure @grnydawn mentions above, but since it was not repeatable, I don't want to hold up this merge.

brian-oneill added 15 commits November 12, 2024 11:24

Replace Neighbor and ExchList member vectors with host and device arrays

a9c8d57

Update constructors and desctructors for Neighbor and ExchList

fe0ba3d

Add ArrayMemLoc enum and findMemLoc function

108059a

Add ArrayRank struct template

d30a6b0

Add devBufferPUP function template

4c5bff0

packBuffer update

e5d03d2

Replace packBuffer overloaded functions with specialized function templates for supported Kokkos array ranks. Allow for buffer packing of device arrays as well as host arrays.

unpackBuffer update

98eeb82

Replace unpackBuffer overloaded functions with specialized function templates for supported Kokkos array ranks. Allow for buffer unpacking into device arrays as well as host arrays.

startReceives and startSends updates

74b157f

Allow for MPI communication of buffer arrays on either host or device

update exchangeFullArrayHalo function to handle device arrays

a0572d5

Add device array tests to HaloTest

4ea063f

Updates to State, Tracers, and RK4

c5a32cb

Remove device->host and host->device deep copies no longer needed for halo exchanges

Unit test updates

da0b821

Remove unnecessary deep copies and add Halo destructor

Merge branch 'E3SM-Project:develop' into boneill/omega/device-halo-ex…

8d640af

…change

Omega build updates

e62e075

Add OMEGA_MPI_ON_DEVICE build flag and machine specific build configs for Frontier and Perlmutter

Update comments and documentation

ac9bdbd

brian-oneill added the Omega label Nov 18, 2024

brian-oneill requested review from mark-petersen, mwarusz and grnydawn November 18, 2024 15:33

This comment was marked as duplicate.

Sign in to view

grnydawn reviewed Nov 24, 2024

View reviewed changes

components/omega/OmegaBuild.cmake Outdated Show resolved Hide resolved

components/omega/OmegaBuild.cmake Outdated Show resolved Hide resolved

components/omega/src/base/Halo.h Outdated Show resolved Hide resolved

mwarusz reviewed Nov 25, 2024

View reviewed changes

components/omega/src/ocn/OceanState.cpp Outdated Show resolved Hide resolved

components/omega/src/base/Halo.h Outdated Show resolved Hide resolved

components/omega/src/base/Halo.h Outdated Show resolved Hide resolved

mwarusz reviewed Nov 25, 2024

View reviewed changes

components/omega/src/base/Halo.h Outdated Show resolved Hide resolved

grnydawn reviewed Nov 25, 2024

View reviewed changes

components/omega/src/base/Halo.cpp Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

Exchange device arrays in State and Tracer exchangeHalo functions

77e7ff9

Rename and move some enums, functions, and structs to OmegaKokkos.h

fa457f2

Unify similar functions and enums from Halo.h and Field.h and move them to OmegaKokkos.h, along with ArrayRank struct.

grnydawn reviewed Nov 26, 2024

View reviewed changes

components/omega/OmegaBuild.cmake Outdated Show resolved Hide resolved

mark-petersen requested a review from kieran-ringel November 26, 2024 20:06

grnydawn reviewed Nov 26, 2024

View reviewed changes

components/omega/OmegaBuild.cmake Outdated Show resolved Hide resolved

brian-oneill added 2 commits November 26, 2024 14:14

Change conditions for appending MPICH GPU support flag

e7f17cf

Remove system-specific check, and instead append `export MPICH_GPU_SUPPORT_ENABLED=1` to omega_env.sh whenever Omega is built with MPICH, and both OMEGA_TARGET_DEVICE and OMEGA_MPI_ON_DEVICE are true

kieran-ringel approved these changes Dec 2, 2024

View reviewed changes

grnydawn approved these changes Dec 3, 2024

View reviewed changes

brian-oneill added 2 commits December 4, 2024 15:22

Merge branch 'develop' into boneill/omega/device-halo-exchange

38d5c19

Linting fix

7cb1509

mark-petersen approved these changes Dec 5, 2024

View reviewed changes

mark-petersen merged commit 217a69b into E3SM-Project:develop Dec 5, 2024
2 checks passed

brian-oneill mentioned this pull request Dec 10, 2024

Fix State unit test failure #186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Halo class to allow halo exchanges for device arrays #163

Update Halo class to allow halo exchanges for device arrays #163

brian-oneill commented Nov 18, 2024

rljacob commented Nov 18, 2024

grnydawn commented Nov 18, 2024

rljacob commented Nov 18, 2024

mark-petersen commented Nov 21, 2024

This comment was marked as duplicate.

grnydawn left a comment

mwarusz left a comment

This comment was marked as outdated.

mwarusz commented Nov 25, 2024

kieran-ringel commented Nov 25, 2024 •

edited by mark-petersen

Loading

grnydawn left a comment

brian-oneill commented Dec 3, 2024

kieran-ringel commented Dec 3, 2024

grnydawn commented Dec 4, 2024

mark-petersen left a comment

mark-petersen commented Dec 5, 2024

Update Halo class to allow halo exchanges for device arrays #163

Update Halo class to allow halo exchanges for device arrays #163

Conversation

brian-oneill commented Nov 18, 2024

rljacob commented Nov 18, 2024

grnydawn commented Nov 18, 2024

rljacob commented Nov 18, 2024

mark-petersen commented Nov 21, 2024

This comment was marked as duplicate.

grnydawn left a comment

Choose a reason for hiding this comment

mwarusz left a comment

Choose a reason for hiding this comment

This comment was marked as outdated.

mwarusz commented Nov 25, 2024

kieran-ringel commented Nov 25, 2024 • edited by mark-petersen Loading

grnydawn left a comment

Choose a reason for hiding this comment

brian-oneill commented Dec 3, 2024

kieran-ringel commented Dec 3, 2024

grnydawn commented Dec 4, 2024

mark-petersen left a comment

Choose a reason for hiding this comment

mark-petersen commented Dec 5, 2024

kieran-ringel commented Nov 25, 2024 •

edited by mark-petersen

Loading