Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu code #58

Open
wants to merge 137 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
b16a6ff
Added required changes to configure.ac to allow comp w cuda & hip. Ad…
abouzied-nasar Oct 21, 2024
d9ca273
Added cuda and hip linking directives to Makefile.am
abouzied-nasar Oct 21, 2024
ecef19e
Added AC_PROG_CXX to configure.ac
abouzied-nasar Oct 21, 2024
5cb3705
Added first GPU files: src/runner_gpu_pack_functionc.c and src/cuda/p…
abouzied-nasar Oct 21, 2024
4edf58e
ACTUALLY added first GPU files: src/runner_gpu_pack_functionc.c and s…
abouzied-nasar Oct 21, 2024
0c90ab5
Added more files for GPU code. Seems to work fine aside from config.h…
abouzied-nasar Oct 22, 2024
f19c35a
Added ifdefs to a few files to a) Stop can't find config.h errors and…
abouzied-nasar Oct 22, 2024
6334924
Added dummy.c in src/cuda
abouzied-nasar Oct 22, 2024
8b64cbb
Added cudalt.py dummy.C src/cuda/tester.cu
abouzied-nasar Oct 23, 2024
a822b0b
Added code to cell.c and h cell_hydro.c cell_unskip.c engine_config.c…
abouzied-nasar Oct 24, 2024
b0193cb
Added code to engine.c
abouzied-nasar Oct 24, 2024
391d2bf
Removed bug from engine.c and added some code to scheduler.c
abouzied-nasar Oct 24, 2024
9bd433d
added code to scheduler.* engine_unskip.c
abouzied-nasar Oct 24, 2024
8d0d437
I had made a mistake by putting runner_doiact_functions_hydro_gpu.h i…
abouzied-nasar Oct 24, 2024
caa0852
Sorted most of the code out. Compiles and runs fine with gpu offload …
abouzied-nasar Oct 24, 2024
294f7ae
Added code to engine_marktasks.c
abouzied-nasar Oct 25, 2024
61c83a1
All coded up but there seems to be a problem with duplicate unlocks. …
abouzied-nasar Oct 25, 2024
21ed5cd
Made some changes here and there to try and get deps right for unpack…
abouzied-nasar Oct 28, 2024
aa3eeab
Commented out GPU code from engine_marktasks.c to see if that could h…
abouzied-nasar Oct 29, 2024
9c1b494
Removed duplicate engine_addlink for g and f pack tasks. And re-wired…
abouzied-nasar Oct 29, 2024
5e08222
Minor changes here and there
abouzied-nasar Oct 29, 2024
af0d256
Found a bug in task.c -> Wasn't unlocking gradient pack task
abouzied-nasar Oct 30, 2024
6551027
Found a bug in task.c -> Wasn't unlocking gradient pack task
abouzied-nasar Oct 30, 2024
1dc8451
Code still hanging. Will try starting from scratch with runner_main_c…
abouzied-nasar Oct 30, 2024
8103149
Copied over both runner_main_clean and runner_doiact_functions_hydro_…
abouzied-nasar Oct 30, 2024
badc9d4
Copied over both runner_main_clean and runner_doiact_functions_hydro_…
abouzied-nasar Oct 30, 2024
45ea651
Issue was not with #ifdefs in runner_main_clean.cu or runner_doiact..…
abouzied-nasar Oct 30, 2024
a9f81dd
Issue is probably with how I am locking and unlocking tasks or someth…
abouzied-nasar Oct 30, 2024
2cc07c5
signalling sleeping runners just after packing seems to prevent hangi…
abouzied-nasar Oct 31, 2024
4d6fe3d
Testing to see if code still hangs when making deps on pack tasks ins…
abouzied-nasar Nov 1, 2024
57d77a2
Added scheduler_done to runner_doiact_functions_hydro_gpu.h. Also com…
abouzied-nasar Nov 1, 2024
d952965
Checked engine_maketasks.c and things seem reasonable with nothing mi…
abouzied-nasar Nov 1, 2024
13912ce
Fixed a bug in scheduler_enqueue() where I had a break before where i…
abouzied-nasar Nov 1, 2024
3632ba2
Removed all GPU tasks aside from density self pack tasks. COde still …
abouzied-nasar Nov 4, 2024
d77dd2f
Found bug in how we set n_tasks_left* in scheduler_rewait it should b…
abouzied-nasar Nov 4, 2024
db43727
Fix is in for force and gradient pack tasks but needs de-bugging as c…
abouzied-nasar Nov 4, 2024
abecded
Missing brackets. FIX!
abouzied-nasar Nov 5, 2024
e8aa0ae
Fix bracketting in MPI hydro recv construction
MatthieuSchaller Nov 5, 2024
6aa324d
Fixed missing closing curly
MatthieuSchaller Nov 5, 2024
5e4eafe
Applied code formatting script blindly
MatthieuSchaller Nov 5, 2024
780553d
Added new ifdef-controls to offload only the density/gradient/force h…
MatthieuSchaller Nov 5, 2024
3b58f55
Added new ifdef-controls to offload only the density/gradient/force h…
MatthieuSchaller Nov 5, 2024
81c2283
Fix another bracketing issue
MatthieuSchaller Nov 5, 2024
b21d912
Fix another bracketing issue
MatthieuSchaller Nov 5, 2024
b4f4203
Fix logic mistake
MatthieuSchaller Nov 5, 2024
4a90516
First attempt at MPI dependencies
MatthieuSchaller Nov 5, 2024
d212364
Changed fprints to message in engine_maktasks.c
abouzied-nasar Nov 5, 2024
7c3728d
Fix comm name typo
MatthieuSchaller Nov 5, 2024
af8bc7b
Only scream if a local cell has no ghost_in
MatthieuSchaller Nov 5, 2024
732370d
Add unpack ---> recv dependencies
MatthieuSchaller Nov 5, 2024
c43b7b5
Fix error message in the task construction
MatthieuSchaller Nov 5, 2024
e64e17a
Only create the timing files if we are dumping timings. Assign the de…
MatthieuSchaller Nov 5, 2024
9981039
Put the gpu pack/unpack tasks in the correct category
MatthieuSchaller Nov 5, 2024
dd041fe
Do not update the total task time in signal_sleeping_runners
MatthieuSchaller Nov 5, 2024
e506c79
Time the pack and unpack of density pair separately
MatthieuSchaller Nov 6, 2024
3ccbaf0
Fix typos
MatthieuSchaller Nov 6, 2024
707ad45
Added timers for packing all GPU task types
abouzied-nasar Nov 6, 2024
425b074
Added timers for unpacking all GPU task types
abouzied-nasar Nov 6, 2024
4634905
modified scheduler_report_task_times_mapper() to account for differen…
abouzied-nasar Nov 6, 2024
7b8baa6
Reverted back to one if statement only for pack tasks as unpack tasks…
abouzied-nasar Nov 6, 2024
23cf181
made pack task types set to task_category_gpu
abouzied-nasar Nov 6, 2024
d7f980f
First attempt at re-instating task stealing
MatthieuSchaller Nov 6, 2024
5b67cd4
Put varaible back in the code
MatthieuSchaller Nov 6, 2024
51c788a
put in fix for swift_task_debug in runner_main_clean.cu
abouzied-nasar Nov 7, 2024
ddc843b
Fix code to allow running with SWIFT_DEBUG_TASKS
MatthieuSchaller Nov 8, 2024
0447148
Fix typo
MatthieuSchaller Nov 8, 2024
037dbd5
Applied formatting script
MatthieuSchaller Nov 8, 2024
7eab151
Commented out task splitting for hydro and added fix for cell-less ta…
abouzied-nasar Nov 8, 2024
55c50b2
Call signal_sleeping_runners() in runner_main and not inside the pack…
MatthieuSchaller Nov 8, 2024
3800f53
Changed the fix for duplicate unlocks so that we skip checking for du…
abouzied-nasar Nov 9, 2024
40b67ee
Modified signal_sleeping_runners to only signal once per pack. Also m…
abouzied-nasar Nov 9, 2024
339539d
Setup and tested for optimal pack_size. GPU code is about 20% fatser …
Nov 11, 2024
642b4de
Changed part_gpu.h to compile on Bede
Nov 11, 2024
9664fa8
Made duplicate CPU tasks implicit and switched back to enqueueing dep…
abouzied-nasar Nov 11, 2024
03d15b1
Commented out debug code in cell_unskip.c, come back and put code in …
abouzied-nasar Nov 12, 2024
66883bb
Modified cell_unskip.c so GPU debug bits only active when SWIFT_DEBUG…
abouzied-nasar Nov 12, 2024
fb531da
Added some files for HIP compilations
abouzied-nasar Nov 12, 2024
69d7d47
Quick stab at splitting GPU tasks. May have to revert as done in haste
abouzied-nasar Nov 19, 2024
72fa9d3
Fixed a few bugs with GPU task splitting. Code hangs after a few step…
abouzied-nasar Nov 19, 2024
83972f3
Fixed a bug in if statement in runner_main2(). Added some code to mak…
abouzied-nasar Nov 19, 2024
d250edf
Found another bug. I was not accounting for split tasks when allowing…
abouzied-nasar Nov 19, 2024
cb7abcf
Converted sub_selfs and sub_pairs to selfs and pairs in maketasks. Co…
abouzied-nasar Nov 20, 2024
2ee5742
Removed sub tasks from runner_main and moved making density sub tasks…
abouzied-nasar Nov 20, 2024
4d726d1
commented out sub tasks in runner_main. Edited how tasks are created
abouzied-nasar Nov 20, 2024
dcbad17
Made sub tasks implicit and fixed bug with atomic_dec in runner_main
abouzied-nasar Nov 20, 2024
2beb67e
reverted to converting subs
abouzied-nasar Nov 20, 2024
2a0d137
Too tired to carry on. There might be something to playing with space…
abouzied-nasar Nov 20, 2024
4bfa575
A few edits to test on Bede GHs
abouzied-nasar Nov 21, 2024
50c6a74
Changed a few parameters
abouzied-nasar Nov 21, 2024
d808695
Removed redundnat commented out code from engine_maketasks()
abouzied-nasar Nov 21, 2024
18513c9
before adding time
abouzied-nasar Nov 21, 2024
306d32e
Chnaged task_unlocks to cell_unlocktree. Modified timer function for …
abouzied-nasar Nov 21, 2024
3e0c7b8
Remved unnecessary counter from engine_maketasks
abouzied-nasar Nov 21, 2024
fe5d945
Changed how s->nr_self_pack_tasks is incremented in scheduler.c
abouzied-nasar Nov 21, 2024
744c469
Commented out timing code in runner_main. The code ran for a full sim…
abouzied-nasar Nov 21, 2024
5b54b0f
Made changes to task dump for debugging to account for (skip) cell-le…
abouzied-nasar Dec 14, 2024
7a168d5
Added comments to remin myself to threadpool some bits in GPU task mg…
abouzied-nasar Dec 14, 2024
f7bf486
Added code to pass eta_neighbours (h/dx) to GPU code for calculating …
abouzied-nasar Dec 15, 2024
ff7087c
Removed conversion of sub-pairs to pairs, etc, in engine_maketasks fo…
abouzied-nasar Dec 16, 2024
ca9deaf
Added more debug checks to figure out why we got cells with 8x expect…
abouzied-nasar Dec 17, 2024
408b0e4
Hard-coded space_splitsize_default to 100 with no avail. Something fi…
abouzied-nasar Dec 17, 2024
fa1e7c4
Hard-coded space_grid_split_threshold_default to 100 and space_subsiz…
abouzied-nasar Dec 17, 2024
26585ec
I was barking up the wrong tree. The issue was that some particles wi…
abouzied-nasar Dec 18, 2024
4063e37
Added some comments
abouzied-nasar Dec 18, 2024
fd849fc
Okay. Now back to stealing. Code hangs when stealing enabled
abouzied-nasar Dec 18, 2024
fd9cfa0
Testing to see what happens when only self-dens tasks are done on GPU…
abouzied-nasar Dec 18, 2024
e9cb4dd
Reverted back to offloading only density tasks (self and pair)
abouzied-nasar Dec 19, 2024
e94fe6e
Weirdly code does not hang when using bigger test case and only densi…
abouzied-nasar Dec 19, 2024
0e3196d
Reverted to doing all tasks on GPU. Need to test on GHopper
abouzied-nasar Dec 19, 2024
3dfb81c
Deleted obsolete debug code
abouzied-nasar Dec 19, 2024
57cb5d9
Deleted obsolete debug code
abouzied-nasar Dec 19, 2024
02e18e4
Put in a weird fix in scheduler_gettask but I think it is wrong and s…
abouzied-nasar Dec 19, 2024
153558e
Deleted commented out obsolete code
abouzied-nasar Dec 19, 2024
f71201e
Changed condition to launch_leftovers to if leftover tasks <=1 instea…
abouzied-nasar Dec 19, 2024
707928b
Changed condition to launch_leftovers to if leftover tasks <=1 instea…
abouzied-nasar Dec 19, 2024
c5716b6
Extended <2 condition to other GPU task subtypes but the code now han…
abouzied-nasar Dec 19, 2024
b78ba84
Minor changes for testing. Commented out debug code for checking if t…
abouzied-nasar Dec 23, 2024
b533d16
Reverted condition for launch leftovers to n_packs_left < 1 instead o…
abouzied-nasar Dec 23, 2024
ceb4d6f
Allowed stealing engine_config. Changed debug if statement to error r…
abouzied-nasar Dec 30, 2024
4929cf8
Added simple debug code to scream when we have way more particles tha…
abouzied-nasar Dec 30, 2024
392e2ce
Added code to monitor how many GPU tasks a thread steals. Code still …
abouzied-nasar Dec 30, 2024
29e074b
Very weird behaviour found. Added some debug code set to crash run if…
abouzied-nasar Jan 6, 2025
949fd47
Moving tasks_left counter incrementation from scheduler.c to queue_in…
abouzied-nasar Jan 7, 2025
5abe90b
Changed if statements to atomic_CAS. Code hangs but the CAS does not …
abouzied-nasar Jan 7, 2025
5082ca9
Changed comp val to int in CAS instead of int *. Removed debug code. …
abouzied-nasar Jan 7, 2025
38953c4
Removed debug code which throws error if launching leftovers but stil…
abouzied-nasar Jan 7, 2025
c7e6a6c
Combined atomic dec and CAS calls into one operation. Code seems to r…
abouzied-nasar Jan 7, 2025
f3e3138
Removed un-necesseary mods from swift.c
abouzied-nasar Jan 8, 2025
a58d968
Added comment to explain changes in task.c
abouzied-nasar Jan 8, 2025
4513dca
Removed un-necessary code from space.h
abouzied-nasar Jan 8, 2025
cf16ae4
Reverted swift.c and space.h as eta_neighbours was indeed required fo…
abouzied-nasar Jan 8, 2025
7871c95
Reverted swift.c and space.h as eta_neighbours was indeed required fo…
abouzied-nasar Jan 8, 2025
3bf586e
Cleaned runner_main a bit whilst checking for source of new code hang…
abouzied-nasar Jan 8, 2025
091819d
Cleaned up un-necessary code from queue.c
abouzied-nasar Jan 8, 2025
ea17e54
Reverted back to signalling runners individually for each task which …
abouzied-nasar Jan 9, 2025
4ab67c2
Added additional switch in runner_main to double check whether to lau…
abouzied-nasar Jan 9, 2025
f8463e8
Code hanging. Unsure why. Made significant changes. Changed counters …
abouzied-nasar Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,23 @@ bin_PROGRAMS += fof_mpi
endif
endif

# BUILD CUDA versions as well?
if HAVECUDA
bin_PROGRAMS += swift_cuda
if HAVEMPI
bin_PROGRAMS += swift_mpicuda
endif
endif


# BUILD HIP versions as well?
if HAVEHIP
bin_PROGRAMS += swift_hip
if HAVEMPI
bin_PROGRAMS += swift_mpihip
endif
endif

# engine_policy_setaffinity is available?
if HAVESETAFFINITY
ENGINE_POLICY_SETAFFINITY=| engine_policy_setaffinity
Expand All @@ -91,6 +108,28 @@ swift_mpi_SOURCES = swift.c
swift_mpi_CFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(MPI_FLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)"
swift_mpi_LDADD = src/libswiftsim_mpi.la argparse/libargparse.la $(MPI_LIBS) $(VELOCIRAPTOR_MPI_LIBS) $(EXTRA_LIBS) $(LD_CSDS)

# Sources for swift_cuda
swift_cuda_SOURCES = swift.c dummy.C
swift_cuda_CXXFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(CUDA_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)" -DWITH_CUDA
swift_cuda_LDADD = src/.libs/libswiftsim_cuda.a src/cuda/.libs/libswiftCUDA.a $(EXTRA_LIBS) $(CUDA_LIBS) -lcudart argparse/.libs/libargparse.a src/.libs/libgrav.la

# Sources for swift_hip
swift_hip_SOURCES = swift.c dummy.C
swift_hip_CXXFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(HIP_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)" -DWITH_HIP
swift_hip_LDADD = src/.libs/libswiftsim_hip.a src/hip/.libs/libswiftHIP.a $(EXTRA_LIBS) $(HIP_LIBS) -lamdhip64 -L/opt/rocm-5.1.0/lib -lhsa-runtime64 -L/opt/rocm-5.1.0/lib64 -lamd_comgr argparse/.libs/libargparse.a src/.libs/libgrav.la

# Sources for swift_mpicuda, do we need an affinity policy for MPI?
swift_mpicuda_SOURCES = swift.c dummy.C
swift_mpicuda_CXXFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(MPI_FLAGS) $(CUDA_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)" -DWITH_CUDA
swift_mpicuda_CFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(MPI_FLAGS) $(CUDA_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)" -DWITH_CUDA
swift_mpicuda_LDADD = src/.libs/libswiftsim_mpicuda.a argparse/.libs/libargparse.a src/.libs/libgrav.la src/cuda/.libs/libswiftCUDA.a $(MPI_LIBS) $(EXTRA_LIBS) $(CUDA_LIBS) -lcudart

# Sources for swift_mpihip, do we need an affinity policy for MPI?
swift_mpihip_SOURCES = swift.c dummy.C
swift_mpihip_CXXFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(MPI_FLAGS) $(HIP_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)" -DWITH_HIP
swift_mpihip_CFLAGS = $(MYFLAGS) $(AM_CFLAGS) $(MPI_FLAGS) $(HIP_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)" -DWITH_HIP
swift_mpihip_LDADD = src/.libs/libswiftsim_mpihip.a argparse/.libs/libargparse.a src/.libs/libgrav.la src/hip/.libs/libswiftHIP.a $(MPI_LIBS) $(EXTRA_LIBS) $(HIP_LIBS) -lamdhip64

# Sources for fof
fof_SOURCES = swift_fof.c
fof_CFLAGS = $(MYFLAGS) $(AM_CFLAGS) -DENGINE_POLICY="engine_policy_keep $(ENGINE_POLICY_SETAFFINITY)"
Expand Down
82 changes: 82 additions & 0 deletions configure.ac
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ AC_USE_SYSTEM_EXTENSIONS
AC_PROG_CC
AM_PROG_CC_C_O

# Find and test the C++ compiler.
AC_PROG_CXX
AC_PROG_CXX_C_O

# We need this for compilation hints and possibly FFTW.
AX_OPENMP

Expand Down Expand Up @@ -995,6 +999,78 @@ AH_VERBATIM([__STDC_FORMAT_MACROS],
#define __STDC_FORMAT_MACROS 1
#endif])



# Check for CUDA
have_cuda="no"
AC_ARG_WITH([cuda],
[AS_HELP_STRING([--with-cuda=PATH],
[root directory where CUDA is installed @<:@yes/no@:>@]
)],
[],
[with_cuda="no"]
)
if test "x$with_cuda" != "xno"; then
if test "x$with_cuda" != "xyes"; then
CUDA_CFLAGS="-I$with_cuda/include"
CUDA_LIBS="-L$with_cuda/lib -L$with_cuda/lib64 -lcudart"
NVCC="$with_cuda/bin/nvcc"
have_cuda="yes"
else
AC_PATH_PROG([NVCC],[nvcc])
echo "Found nvcc = $NVCC"
if test -n "$NVCC"; then
CUDA_ROOT="`dirname $NVCC`/.."
CUDA_CFLAGS="-I${CUDA_ROOT}/include"
CUDA_LIBS="-L${CUDA_ROOT}/lib -L${CUDA_ROOT}/lib64 -lcudart"
have_cuda="yes"
fi
fi
if test "x$have_cuda" != "xno"; then
AC_DEFINE([HAVE_CUDA], 1, [The CUDA compiler is installed.])
fi
CFLAGS="${CFLAGS} "
fi
AC_SUBST(CUDA_CFLAGS)
AC_SUBST(CUDA_LIBS)
AC_SUBST(NVCC)
AM_CONDITIONAL([HAVECUDA],[test -n "$NVCC"])

# Check for HIP
have_hip="no"
AC_ARG_WITH([hip],
[AS_HELP_STRING([--with-hip=PATH],
[root directory where HIP is installed @<:@yes/no@:>@]
)],
[],
[with_hip="no"]
)
if test "x$with_hip" != "xno"; then
if test "x$with_hip" != "xyes"; then
HIP_CFLAGS="-I$with_hip/include"
HIP_LIBS="-L$with_hip/lib -L$with_hip/lib64"
HIPCC="$with_hip/bin/hipcc"
have_hip="yes"
else
AC_PATH_PROG([HIPCC],[hipcc])
echo "Found hipcc = $HIPCC"
if test -n "$HIPCC"; then
HIP_ROOT="`dirname $HIPCC`/.."
HIP_CFLAGS="-I${HIP_ROOT}/include"
HIP_LIBS="-L${HIP_ROOT}/lib -L${HIP_ROOT}/lib64"
have_hip="yes"
fi
fi
if test "x$have_hip" != "xno"; then
AC_DEFINE([HAVE_HIP], 1, [The HIP compiler is installed.])
fi
CFLAGS="${CFLAGS} "
fi
AC_SUBST(HIP_CFLAGS)
AC_SUBST(HIP_LIBS)
AC_SUBST(HIPCC)
AM_CONDITIONAL([HAVEHIP],[test -n "$HIPCC"])

# Check for FFTW. We test for this in the standard directories by default,
# and only disable if using --with-fftw=no or --without-fftw. When a value
# is given FFTW must be found.
Expand Down Expand Up @@ -3246,6 +3322,10 @@ AC_CONFIG_FILES([tests/testSelectOutput.sh], [chmod +x tests/testSelectOutput.sh
AC_CONFIG_FILES([tests/testFormat.sh], [chmod +x tests/testFormat.sh])
AC_CONFIG_FILES([tests/testNeutrinoCosmology.sh], [chmod +x tests/testNeutrinoCosmology.sh])
AC_CONFIG_FILES([tests/output_list_params.yml])
# cuda .in file
AC_CONFIG_FILES([src/cuda/Makefile])
# hip .in file
AC_CONFIG_FILES([src/hip/Makefile])

# Save the compilation options
AC_DEFINE_UNQUOTED([SWIFT_CONFIG_FLAGS],["$swift_config_flags"],[Flags passed to configure])
Expand Down Expand Up @@ -3276,6 +3356,8 @@ AC_MSG_RESULT([
HDF5 enabled : $with_hdf5
- parallel : $have_parallel_hdf5
METIS/ParMETIS : $have_metis / $have_parmetis
CUDA enabled : $have_cuda
HIP enabled : $have_hip
FFTW3 enabled : $have_fftw
- threaded/openmp : $have_threaded_fftw / $have_openmp_fftw
- MPI : $have_mpi_fftw
Expand Down
80 changes: 80 additions & 0 deletions cudalt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#!/usr/bin/python3
# libtoolish hack: compile a .cu file like libtool does
import sys
import os

lo_filepath = sys.argv[1]
o_filepath = lo_filepath.replace(".lo", ".o")

try:
i = o_filepath.rindex("/")
lo_dir = o_filepath[0:i+1]
o_filename = o_filepath[i+1:]

except ValueError:
lo_dir = ""
o_filename = o_filepath

local_pic_dir = ".libs/"
local_npic_dir = ""
pic_dir = lo_dir + local_pic_dir
npic_dir = lo_dir + local_npic_dir

pic_filepath = pic_dir + o_filename
npic_filepath = npic_dir + o_filename
local_pic_filepath = local_pic_dir + o_filename
local_npic_filepath = local_npic_dir + o_filename

# Make lib dir
try:
os.mkdir(pic_dir)
except OSError:
pass

# generate the command to compile the .cu for shared library
args = sys.argv[2:]
args.extend(["-Xcompiler","-fPIC"])
# position indep code
args.append("-o")
args.append(pic_filepath)
command = " ".join(args)
print (command)

# compile the .cu
rv = os.system(command)
if rv != 0:
sys.exit(1)

# generate the command to compile the .cu for static library
args = sys.argv[2:]
args.append("-o")
args.append(npic_filepath)
command = " ".join(args)
print (command)

# compile the .cu
rv = os.system(command)
if rv != 0:
sys.exit(1)

# get libtool version
fd = os.popen("libtool --version")
libtool_version = fd.readline()
fd.close()

# generate the .lo file
f = open(lo_filepath, "w")
f.write("# " + lo_filepath + " - a libtool object file\n")
f.write("# Generated by " + libtool_version + "\n")
f.write("#\n")
f.write("# Please DO NOT delete this file!\n")
f.write("# It is necessary for linking the library.\n\n")

f.write("# Name of the PIC object.\n")
f.write("pic_object='" + local_pic_filepath + "'\n\n")

f.write("# Name of the non-PIC object.\n")
f.write("non_pic_object='" + local_npic_filepath + "'\n")
f.close()

sys.exit(0)
3 changes: 3 additions & 0 deletions dummy.C
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
void dummy(){

}
2 changes: 1 addition & 1 deletion examples/HydroTests/GreshoVortex_3D/getGlass.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
#!/bin/bash
wget http://virgodb.cosma.dur.ac.uk/swift-webstorage/ICs/glassCube_64.hdf5
wget http://virgodb.cosma.dur.ac.uk/swift-webstorage/ICs/glassCube_128.hdf5
17 changes: 11 additions & 6 deletions examples/HydroTests/GreshoVortex_3D/gresho.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,24 @@ InternalUnitSystem:
UnitTemp_in_cgs: 1 # Kelvin

Scheduler:
max_top_level_cells: 15

max_top_level_cells: 8
tasks_per_cell: 200
deadlock_waiting_time_s: 10
cell_split_size: 80
cell_sub_size_pair_hydro: 50 # (Optional) Maximal number of hydro-hydro interactions per sub-pair hydro/star task (this is the default value).
cell_sub_size_self_hydro: 50 # (Optional) Maximal number of hydro-hydro interactions per sub-self hydro/star task. Set to how many cells are targeted for GPU tasks
# Parameters governing the time integration
TimeIntegration:
time_begin: 0. # The starting time of the simulation (in internal units).
time_end: 1. # The end time of the simulation (in internal units).
dt_min: 1e-6 # The minimal time-step size of the simulation (in internal units).
dt_max: 1e-2 # The maximal time-step size of the simulation (in internal units).
dt_max: 1e-4 # The maximal time-step size of the simulation (in internal units).

# Parameters governing the snapshots
Snapshots:
basename: gresho # Common part of the name of output files
time_first: 0. # Time of the first output (in internal units)
delta_time: 1e-1 # Time difference between consecutive outputs (in internal units)
delta_time: 1e-3 # Time difference between consecutive outputs (in internal units)
compression: 1

# Parameters governing the conserved quantities statistics
Expand All @@ -29,10 +33,11 @@ Statistics:

# Parameters for the hydrodynamics scheme
SPH:
resolution_eta: 1.2348 # Target smoothing length in units of the mean inter-particle separation (1.2348 == 48Ngbs with the cubic spline kernel).
resolution_eta: 1.9 # Target smoothing length in units of the mean inter-particle separation (1.2348 == 48Ngbs with the cubic spline kernel).
CFL_condition: 0.1 # Courant-Friedrich-Levy condition for time integration.

# Parameters related to the initial conditions
InitialConditions:
file_name: ./greshoVortex.hdf5 # The file to read
periodic: 1
periodic: 1
# replicate: 2
52 changes: 49 additions & 3 deletions src/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,10 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.

# Add the non-standard paths to the included library headers
AM_CFLAGS = $(HDF5_CPPFLAGS) $(GSL_INCS) $(FFTW_INCS) $(NUMA_INCS) $(GRACKLE_INCS) $(SUNDIALS_INCS) $(CHEALPIX_CFLAGS)
AM_CFLAGS = $(HDF5_CPPFLAGS) $(GSL_INCS) $(FFTW_INCS) $(NUMA_INCS) $(GRACKLE_INCS) $(SUNDIALS_INCS) $(CHEALPIX_CFLAGS) -O0

# Add HIP Path
AM_CFLAGS += -D__HIP_PLATFORM_AMD__

# Assign a "safe" version number
AM_LDFLAGS = $(HDF5_LDFLAGS) $(FFTW_LIBS)
Expand All @@ -40,6 +43,22 @@ lib_LTLIBRARIES += libswiftsim_mpi.la
noinst_LTLIBRARIES += libgrav_mpi.la
endif

# Build a cuda version too?
if HAVECUDA
lib_LTLIBRARIES += libswiftsim_cuda.la
if HAVEMPI
lib_LTLIBRARIES += libswiftsim_mpicuda.la
endif
endif

# Build a hip version too?
if HAVEHIP
lib_LTLIBRARIES += libswiftsim_hip.la
if HAVEMPI
lib_LTLIBRARIES += libswiftsim_mpihip.la
endif
endif

# List required headers
include_HEADERS = space.h runner.h queue.h task.h lock.h cell.h part.h const.h
include_HEADERS += cell_hydro.h cell_stars.h cell_grav.h cell_sinks.h cell_black_holes.h cell_rt.h cell_grid.h
Expand Down Expand Up @@ -161,7 +180,7 @@ endif
AM_SOURCES = space.c space_rebuild.c space_regrid.c space_unique_id.c
AM_SOURCES += space_sort.c space_split.c space_extras.c space_first_init.c space_init.c
AM_SOURCES += space_cell_index.c space_recycle.c
AM_SOURCES += runner_main.c runner_doiact_hydro.c runner_doiact_limiter.c
AM_SOURCES += runner_main.c runner_doiact_hydro.c runner_doiact_limiter.c runner_gpu_pack_functions.c
AM_SOURCES += runner_doiact_stars.c runner_doiact_black_holes.c runner_ghost.c
AM_SOURCES += runner_recv.c runner_pack.c
AM_SOURCES += runner_sort.c runner_drift.c runner_black_holes.c runner_time_integration.c
Expand Down Expand Up @@ -208,7 +227,7 @@ AM_SOURCES += $(SPHM1RT_RT_SOURCES)
AM_SOURCES += $(GEAR_RT_SOURCES)

# Include files for distribution, not installation.
nobase_noinst_HEADERS = align.h approx_math.h atomic.h barrier.h cycle.h error.h inline.h kernel_hydro.h kernel_gravity.h
nobase_noinst_HEADERS = align.h approx_math.h atomic.h barrier.h cycle.h error.h inline.h kernel_hydro.h kernel_gravity.h runner_gpu_pack_functions.h
nobase_noinst_HEADERS += gravity_iact.h kernel_long_gravity.h vector.h accumulate.h cache.h exp.h log.h
nobase_noinst_HEADERS += runner_doiact_nosort.h runner_doiact_hydro.h runner_doiact_stars.h runner_doiact_black_holes.h runner_doiact_grav.h
nobase_noinst_HEADERS += runner_doiact_functions_hydro.h runner_doiact_functions_stars.h runner_doiact_functions_black_holes.h
Expand Down Expand Up @@ -526,6 +545,33 @@ libswiftsim_mpi_la_LDFLAGS = $(AM_LDFLAGS) $(MPI_LIBS) $(EXTRA_LIBS) -version-in
libswiftsim_mpi_la_SHORTNAME = mpi
libswiftsim_mpi_la_LIBADD = $(GRACKLE_LIBS) $(VELOCIRAPTOR_LIBS) $(MPI_LIBS) libgrav_mpi.la

# Sources and flags for regular CUDA library
libswiftsim_cuda_la_SOURCES = $(AM_SOURCES)
libswiftsim_cuda_la_CFLAGS = $(AM_CFLAGS) $(CUDA_CFLAGS) -DWITH_CUDA
libswiftsim_cuda_la_CXXFLAGS = $(AM_CFLAGS) $(CUDA_CFLAGS) -DWITH_CUDA
libswiftsim_cuda_la_LDFLAGS = $(AM_LDFLAGS) $(EXTRA_LIBS) $(CUDA_LIBS)
libswiftsim_cuda_la_SHORTNAME = cuda
libswiftsim_cuda_la_LIBADD = $(GRACKLE_LIBS) $(VELOCIRAPTOR_LIBS) $(MPI_LIBS) libgrav.la

# Sources and flags for regular HIP library
libswiftsim_hip_la_SOURCES = $(AM_SOURCES)
libswiftsim_hip_la_CFLAGS = $(AM_CFLAGS) $(HIP_CFLAGS) -DWITH_HIP
libswiftsim_hip_la_LDFLAGS = $(AM_LDFLAGS) $(EXTRA_LIBS) $(HIP_LIBS) -lamdhip64
libswiftsim_hip_la_SHORTNAME = hip
libswiftsim_hip_la_LIBADD = $(GRACKLE_LIBS) $(VELOCIRAPTOR_LIBS) $(MPI_LIBS) libgrav.la

# Sources and flags for MPI CUDA library
libswiftsim_mpicuda_la_SOURCES = $(AM_SOURCES)
libswiftsim_mpicuda_la_CFLAGS = $(AM_CFLAGS) $(MPI_FLAGS) $(CUDA_CFLAGS) -DWITH_CUDA
libswiftsim_mpicuda_la_CXXFLAGS = $(AM_CFLAGS) $(MPI_FLAGS) $(CUDA_CFLAGS) -DWITH_CUDA
libswiftsim_mpicuda_la_LDFLAGS = $(AM_LDFLAGS) $(MPI_LIBS) $(EXTRA_LIBS) $(CUDA_LIBS)
libswiftsim_mpicuda_la_SHORTNAME = mpicuda
libswiftsim_mpicuda_la_LIBADD = $(GRACKLE_LIBS) $(VELOCIRAPTOR_LIBS) $(MPI_LIBS) libgrav_mpi.la

#subdir
SUBDIRS = . cuda
SUBDIRS += . hip

# Versioning. If any sources change then update the version_string.h file with
# the current git revision and package version.
# May have a checkout without a version_string.h file and no git command (tar/zip
Expand Down
Loading