-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable building and testing Omega in single precision #147
Enable building and testing Omega in single precision #147
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me and includes some nice clean-ups as well. I just had a couple questions related to the MPI calls and _Real
suffix.
// Read mesh cell coordinates | ||
readCellArray(XCellH, "xCell"); | ||
readCellArray(YCellH, "yCell"); | ||
readCellArray(ZCellH, "zCell"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These function calls are a nice improvement, thanks.
42e8b11
to
3c6a1b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the double-precision and single-precision versions of the Omega unit tests were profiled using Nsight profilers, and the results indicate that this PR correctly generates the corresponding GPU kernels. However, I have not verified the validity of the algorithms.
The profiling results indicate that the unit tests in this PR are insufficient to determine whether single-precision improves performance, as the kernel sizes are too small and do not appear to be representative of typical climate algorithms.
I approve this PR, assuming it properly handles merging commits from external libraries as I noted in another comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have profiled PLANE_TEST unit test with Nsight profilers and summarized the profiling result below. Further details are in (https://acme-climate.atlassian.net/l/cp/YSFk1Zra).
- The PR correctly generates the single-precision version of the Omega TEND_PLANE unit test.
- The elapsed time for both the single-precision and double-precision versions of the unit test is nearly the same, at around 155 ms, excluding initialization and finalization routines.
- GPU resources are underutilized in both cases. For the double-precision version, Compute (SM) Throughput is 4.15%, and Memory Throughput is 1.39%. For single-precision, Compute (SM) Throughput is 7.27%, and Memory Throughput is 0.81%.
- It appears that the kernels are too small to effectively compare the performance characteristics of different floating-point precisions. The longest kernel runs for about 27 µs, but most kernels run in under 20 µs.
- The arithmetic intensity (AI) of the kernels also appears to be too high, meaning these kernels might not accurately represent the performance characteristics of the full Omega model. The typical AI of climate algorithms ranges between 0.1 and 1, but the AI of these unit test kernels reaches up to 460 FLOPs/byte.
3c6a1b2
to
9b403ad
Compare
After the rebase, able to build and run tests successfully on Chrysalis and Perlmutter CPU & GPU. Everything looks good, approving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving based my inspection and testing by @brian-oneill and @grnydawn.
This PR enables configuring Omega to use single precision and adds a test (reusing the tendency terms test) checking that we can build and run in single precision. The main changes are:
R8
toReal
where appropriateR8
arrays before storing them into the class membersR8
buffers for halo exchangesThe halo buffers change is not optimal for performance in single precision. It would need to be optimized if we decide
to pursue this option seriously in the future.
Checklist