Add non-tensor shared #1721

jeremylt · 2025-01-03T19:49:17Z

This is a prereq for the long awaited */gen non-tensor support.

backends/cuda-shared/kernels/cuda-shared-basis-nontensor.cu

jeremylt · 2025-01-06T22:38:36Z

Edit: This is solved, it was a bad grid size for the weights kernel

Investigating error on PETSc BP3:

Test: petsc-bps BP3, tet elements
  $ build/petsc-bps -ceed /gpu/cuda/shared -test -problem bp3 -degree 3 -ksp_max_it_clip 50,50 -simplex
ERROR: returncode = 15
Output: 
NO MESSAGE
FAIL: stderr
Output: 
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 8 FPE: Floating Point Exception,probably divide by zero
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on NVIDIA CUDA systems to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run 
[0]PETSC ERROR: to get more information on the crash.
Abort(59) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0

jeremylt · 2025-01-07T15:50:21Z

Edit: This happens on my machine on main too, so its not related to this MR.

Also - t354 is failing on my machine but not in CI, so that's odd. I thought it was my CUDA update causing troubles but it persists after restarting.

jeremylt · 2025-01-07T21:20:58Z

Ok, I need to actually test the HIP code, but we've got everything working on the CUDA side of the house.

Edit: Confirmed the new code compiles on Noether, but still bringing my local dev machine back up to date to test

$ make prove -j CEED_BACKENDS=/gpu/cuda/shared passes locally for Ratel with this branch.

jeremylt added enhancement GPU CUDA 0-WIP HIP labels Jan 3, 2025

jeremylt self-assigned this Jan 3, 2025

jeremylt force-pushed the jeremy/shared-nontensor branch 3 times, most recently from 5d36201 to 3e42d60 Compare January 3, 2025 23:48

jeremylt commented Jan 3, 2025

View reviewed changes

backends/cuda-shared/kernels/cuda-shared-basis-nontensor.cu Outdated Show resolved Hide resolved

jeremylt force-pushed the jeremy/shared-nontensor branch 4 times, most recently from 324c97e to c21b7ff Compare January 6, 2025 22:06

jeremylt force-pushed the jeremy/shared-nontensor branch from c21b7ff to 7c2945f Compare January 7, 2025 20:14

jeremylt added 3 commits January 7, 2025 14:15

cuda - add nontensor shared

9ff05d5

minor style

cb270d3

gpu - use gen LoadMatrix in shared

aa4002a

jeremylt force-pushed the jeremy/shared-nontensor branch from 7c2945f to a3fb7fa Compare January 7, 2025 21:16

jeremylt added 2 commits January 8, 2025 11:37

hip - add nontensor shared

6c13bbc

test - shrink sizes in t319 for non-tensor

1f6c24f

jeremylt force-pushed the jeremy/shared-nontensor branch from a3fb7fa to 1f6c24f Compare January 8, 2025 18:37

jeremylt added 1-In Review and removed 0-WIP labels Jan 8, 2025

jeremylt merged commit 1a63be7 into main Jan 9, 2025
28 checks passed

jeremylt deleted the jeremy/shared-nontensor branch January 9, 2025 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add non-tensor shared #1721

Add non-tensor shared #1721

jeremylt commented Jan 3, 2025

jeremylt commented Jan 6, 2025 •

edited

Loading

jeremylt commented Jan 7, 2025 •

edited

Loading

jeremylt commented Jan 7, 2025 •

edited

Loading

Add non-tensor shared #1721

Add non-tensor shared #1721

Conversation

jeremylt commented Jan 3, 2025

jeremylt commented Jan 6, 2025 • edited Loading

jeremylt commented Jan 7, 2025 • edited Loading

jeremylt commented Jan 7, 2025 • edited Loading

jeremylt commented Jan 6, 2025 •

edited

Loading

jeremylt commented Jan 7, 2025 •

edited

Loading

jeremylt commented Jan 7, 2025 •

edited

Loading