Fix AtPoints transpose shift #1723

jeremylt · 2025-01-09T22:58:04Z

Minor performance fix to help prevent collisions in atomic adds for AtPoints transpose basis action on GPUs

Note that previously i = threadIdx.x + threadIdx.y * blockDim.x was the value of p, so this should be largely a more clear replacement for the same logic we had before

jeremylt · 2025-01-09T23:02:27Z

Note: Rust failure is an unrelated issue in their nightly build that they are working on

zatkins-dev · 2025-01-10T00:40:42Z

include/ceed/jit-source/cuda/cuda-shared-basis-tensor-at-points-templates.h

@@ -74,7 +74,7 @@ inline __device__ void InterpTransposeAtPoints1d(SharedData_Cuda &data, const Ce
    // Contract x direction
    if (p < NUM_POINTS) {
      for (CeedInt i = 0; i < Q_1D; i++) {
-        atomicAdd(&data.slice[comp * Q_1D + (i + p) % Q_1D], chebyshev_x[(i + p) % Q_1D] * r_U[comp]);
+        atomicAdd(&data.slice[comp * Q_1D + (i + p) % Q_1D], chebyshev_x[(i + data.t_id_x) % Q_1D] * r_U[comp]);


should the first i + p in this line also be changed?

Oops, yeah. Goes to show that most of the time the two will be the same

zatkins-dev · 2025-01-10T00:41:20Z

include/ceed/jit-source/cuda/cuda-shared-basis-tensor-at-points-templates.h

@@ -120,7 +120,7 @@ inline __device__ void GradTransposeAtPoints1d(SharedData_Cuda &data, const Ceed
    // Contract x direction
    if (p < NUM_POINTS) {
      for (CeedInt i = 0; i < Q_1D; i++) {
-        atomicAdd(&data.slice[comp * Q_1D + (i + p) % Q_1D], chebyshev_x[(i + p) % Q_1D] * r_U[comp]);


zatkins-dev · 2025-01-10T00:41:45Z

include/ceed/jit-source/hip/hip-shared-basis-tensor-at-points-templates.h

@@ -74,7 +74,7 @@ inline __device__ void InterpTransposeAtPoints1d(SharedData_Hip &data, const Cee
    // Contract x direction
    if (p < NUM_POINTS) {
      for (CeedInt i = 0; i < Q_1D; i++) {
-        atomicAdd(&data.slice[comp * Q_1D + (i + p) % Q_1D], chebyshev_x[(i + p) % Q_1D] * r_U[comp]);


zatkins-dev · 2025-01-10T00:41:54Z

include/ceed/jit-source/hip/hip-shared-basis-tensor-at-points-templates.h

@@ -120,7 +120,7 @@ inline __device__ void GradTransposeAtPoints1d(SharedData_Hip &data, const CeedI
    // Contract x direction
    if (p < NUM_POINTS) {
      for (CeedInt i = 0; i < Q_1D; i++) {
-        atomicAdd(&data.slice[comp * Q_1D + (i + p) % Q_1D], chebyshev_x[(i + p) % Q_1D] * r_U[comp]);


jeremylt added minor GPU 1-In Review labels Jan 9, 2025

jeremylt self-assigned this Jan 9, 2025

gpu - fix AtPoints transpose shift

e244d91

jeremylt force-pushed the jeremy/at-points-shifts branch from 47bab60 to e244d91 Compare January 9, 2025 23:00

jeremylt requested a review from zatkins-dev January 9, 2025 23:01

zatkins-dev reviewed Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix AtPoints transpose shift #1723

Fix AtPoints transpose shift #1723

jeremylt commented Jan 9, 2025 •

edited

Loading

jeremylt commented Jan 9, 2025

zatkins-dev Jan 10, 2025

jeremylt Jan 10, 2025

zatkins-dev Jan 10, 2025

zatkins-dev Jan 10, 2025

zatkins-dev Jan 10, 2025

Fix AtPoints transpose shift #1723

Are you sure you want to change the base?

Fix AtPoints transpose shift #1723

Conversation

jeremylt commented Jan 9, 2025 • edited Loading

jeremylt commented Jan 9, 2025

zatkins-dev Jan 10, 2025

Choose a reason for hiding this comment

jeremylt Jan 10, 2025

Choose a reason for hiding this comment

zatkins-dev Jan 10, 2025

Choose a reason for hiding this comment

zatkins-dev Jan 10, 2025

Choose a reason for hiding this comment

zatkins-dev Jan 10, 2025

Choose a reason for hiding this comment

jeremylt commented Jan 9, 2025 •

edited

Loading