Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gpu code #58

Open
wants to merge 137 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
b16a6ff
Added required changes to configure.ac to allow comp w cuda & hip. Ad…
abouzied-nasar Oct 21, 2024
d9ca273
Added cuda and hip linking directives to Makefile.am
abouzied-nasar Oct 21, 2024
ecef19e
Added AC_PROG_CXX to configure.ac
abouzied-nasar Oct 21, 2024
5cb3705
Added first GPU files: src/runner_gpu_pack_functionc.c and src/cuda/p…
abouzied-nasar Oct 21, 2024
4edf58e
ACTUALLY added first GPU files: src/runner_gpu_pack_functionc.c and s…
abouzied-nasar Oct 21, 2024
0c90ab5
Added more files for GPU code. Seems to work fine aside from config.h…
abouzied-nasar Oct 22, 2024
f19c35a
Added ifdefs to a few files to a) Stop can't find config.h errors and…
abouzied-nasar Oct 22, 2024
6334924
Added dummy.c in src/cuda
abouzied-nasar Oct 22, 2024
8b64cbb
Added cudalt.py dummy.C src/cuda/tester.cu
abouzied-nasar Oct 23, 2024
a822b0b
Added code to cell.c and h cell_hydro.c cell_unskip.c engine_config.c…
abouzied-nasar Oct 24, 2024
b0193cb
Added code to engine.c
abouzied-nasar Oct 24, 2024
391d2bf
Removed bug from engine.c and added some code to scheduler.c
abouzied-nasar Oct 24, 2024
9bd433d
added code to scheduler.* engine_unskip.c
abouzied-nasar Oct 24, 2024
8d0d437
I had made a mistake by putting runner_doiact_functions_hydro_gpu.h i…
abouzied-nasar Oct 24, 2024
caa0852
Sorted most of the code out. Compiles and runs fine with gpu offload …
abouzied-nasar Oct 24, 2024
294f7ae
Added code to engine_marktasks.c
abouzied-nasar Oct 25, 2024
61c83a1
All coded up but there seems to be a problem with duplicate unlocks. …
abouzied-nasar Oct 25, 2024
21ed5cd
Made some changes here and there to try and get deps right for unpack…
abouzied-nasar Oct 28, 2024
aa3eeab
Commented out GPU code from engine_marktasks.c to see if that could h…
abouzied-nasar Oct 29, 2024
9c1b494
Removed duplicate engine_addlink for g and f pack tasks. And re-wired…
abouzied-nasar Oct 29, 2024
5e08222
Minor changes here and there
abouzied-nasar Oct 29, 2024
af0d256
Found a bug in task.c -> Wasn't unlocking gradient pack task
abouzied-nasar Oct 30, 2024
6551027
Found a bug in task.c -> Wasn't unlocking gradient pack task
abouzied-nasar Oct 30, 2024
1dc8451
Code still hanging. Will try starting from scratch with runner_main_c…
abouzied-nasar Oct 30, 2024
8103149
Copied over both runner_main_clean and runner_doiact_functions_hydro_…
abouzied-nasar Oct 30, 2024
badc9d4
Copied over both runner_main_clean and runner_doiact_functions_hydro_…
abouzied-nasar Oct 30, 2024
45ea651
Issue was not with #ifdefs in runner_main_clean.cu or runner_doiact..…
abouzied-nasar Oct 30, 2024
a9f81dd
Issue is probably with how I am locking and unlocking tasks or someth…
abouzied-nasar Oct 30, 2024
2cc07c5
signalling sleeping runners just after packing seems to prevent hangi…
abouzied-nasar Oct 31, 2024
4d6fe3d
Testing to see if code still hangs when making deps on pack tasks ins…
abouzied-nasar Nov 1, 2024
57d77a2
Added scheduler_done to runner_doiact_functions_hydro_gpu.h. Also com…
abouzied-nasar Nov 1, 2024
d952965
Checked engine_maketasks.c and things seem reasonable with nothing mi…
abouzied-nasar Nov 1, 2024
13912ce
Fixed a bug in scheduler_enqueue() where I had a break before where i…
abouzied-nasar Nov 1, 2024
3632ba2
Removed all GPU tasks aside from density self pack tasks. COde still …
abouzied-nasar Nov 4, 2024
d77dd2f
Found bug in how we set n_tasks_left* in scheduler_rewait it should b…
abouzied-nasar Nov 4, 2024
db43727
Fix is in for force and gradient pack tasks but needs de-bugging as c…
abouzied-nasar Nov 4, 2024
abecded
Missing brackets. FIX!
abouzied-nasar Nov 5, 2024
e8aa0ae
Fix bracketting in MPI hydro recv construction
MatthieuSchaller Nov 5, 2024
6aa324d
Fixed missing closing curly
MatthieuSchaller Nov 5, 2024
5e4eafe
Applied code formatting script blindly
MatthieuSchaller Nov 5, 2024
780553d
Added new ifdef-controls to offload only the density/gradient/force h…
MatthieuSchaller Nov 5, 2024
3b58f55
Added new ifdef-controls to offload only the density/gradient/force h…
MatthieuSchaller Nov 5, 2024
81c2283
Fix another bracketing issue
MatthieuSchaller Nov 5, 2024
b21d912
Fix another bracketing issue
MatthieuSchaller Nov 5, 2024
b4f4203
Fix logic mistake
MatthieuSchaller Nov 5, 2024
4a90516
First attempt at MPI dependencies
MatthieuSchaller Nov 5, 2024
d212364
Changed fprints to message in engine_maktasks.c
abouzied-nasar Nov 5, 2024
7c3728d
Fix comm name typo
MatthieuSchaller Nov 5, 2024
af8bc7b
Only scream if a local cell has no ghost_in
MatthieuSchaller Nov 5, 2024
732370d
Add unpack ---> recv dependencies
MatthieuSchaller Nov 5, 2024
c43b7b5
Fix error message in the task construction
MatthieuSchaller Nov 5, 2024
e64e17a
Only create the timing files if we are dumping timings. Assign the de…
MatthieuSchaller Nov 5, 2024
9981039
Put the gpu pack/unpack tasks in the correct category
MatthieuSchaller Nov 5, 2024
dd041fe
Do not update the total task time in signal_sleeping_runners
MatthieuSchaller Nov 5, 2024
e506c79
Time the pack and unpack of density pair separately
MatthieuSchaller Nov 6, 2024
3ccbaf0
Fix typos
MatthieuSchaller Nov 6, 2024
707ad45
Added timers for packing all GPU task types
abouzied-nasar Nov 6, 2024
425b074
Added timers for unpacking all GPU task types
abouzied-nasar Nov 6, 2024
4634905
modified scheduler_report_task_times_mapper() to account for differen…
abouzied-nasar Nov 6, 2024
7b8baa6
Reverted back to one if statement only for pack tasks as unpack tasks…
abouzied-nasar Nov 6, 2024
23cf181
made pack task types set to task_category_gpu
abouzied-nasar Nov 6, 2024
d7f980f
First attempt at re-instating task stealing
MatthieuSchaller Nov 6, 2024
5b67cd4
Put varaible back in the code
MatthieuSchaller Nov 6, 2024
51c788a
put in fix for swift_task_debug in runner_main_clean.cu
abouzied-nasar Nov 7, 2024
ddc843b
Fix code to allow running with SWIFT_DEBUG_TASKS
MatthieuSchaller Nov 8, 2024
0447148
Fix typo
MatthieuSchaller Nov 8, 2024
037dbd5
Applied formatting script
MatthieuSchaller Nov 8, 2024
7eab151
Commented out task splitting for hydro and added fix for cell-less ta…
abouzied-nasar Nov 8, 2024
55c50b2
Call signal_sleeping_runners() in runner_main and not inside the pack…
MatthieuSchaller Nov 8, 2024
3800f53
Changed the fix for duplicate unlocks so that we skip checking for du…
abouzied-nasar Nov 9, 2024
40b67ee
Modified signal_sleeping_runners to only signal once per pack. Also m…
abouzied-nasar Nov 9, 2024
339539d
Setup and tested for optimal pack_size. GPU code is about 20% fatser …
Nov 11, 2024
642b4de
Changed part_gpu.h to compile on Bede
Nov 11, 2024
9664fa8
Made duplicate CPU tasks implicit and switched back to enqueueing dep…
abouzied-nasar Nov 11, 2024
03d15b1
Commented out debug code in cell_unskip.c, come back and put code in …
abouzied-nasar Nov 12, 2024
66883bb
Modified cell_unskip.c so GPU debug bits only active when SWIFT_DEBUG…
abouzied-nasar Nov 12, 2024
fb531da
Added some files for HIP compilations
abouzied-nasar Nov 12, 2024
69d7d47
Quick stab at splitting GPU tasks. May have to revert as done in haste
abouzied-nasar Nov 19, 2024
72fa9d3
Fixed a few bugs with GPU task splitting. Code hangs after a few step…
abouzied-nasar Nov 19, 2024
83972f3
Fixed a bug in if statement in runner_main2(). Added some code to mak…
abouzied-nasar Nov 19, 2024
d250edf
Found another bug. I was not accounting for split tasks when allowing…
abouzied-nasar Nov 19, 2024
cb7abcf
Converted sub_selfs and sub_pairs to selfs and pairs in maketasks. Co…
abouzied-nasar Nov 20, 2024
2ee5742
Removed sub tasks from runner_main and moved making density sub tasks…
abouzied-nasar Nov 20, 2024
4d726d1
commented out sub tasks in runner_main. Edited how tasks are created
abouzied-nasar Nov 20, 2024
dcbad17
Made sub tasks implicit and fixed bug with atomic_dec in runner_main
abouzied-nasar Nov 20, 2024
2beb67e
reverted to converting subs
abouzied-nasar Nov 20, 2024
2a0d137
Too tired to carry on. There might be something to playing with space…
abouzied-nasar Nov 20, 2024
4bfa575
A few edits to test on Bede GHs
abouzied-nasar Nov 21, 2024
50c6a74
Changed a few parameters
abouzied-nasar Nov 21, 2024
d808695
Removed redundnat commented out code from engine_maketasks()
abouzied-nasar Nov 21, 2024
18513c9
before adding time
abouzied-nasar Nov 21, 2024
306d32e
Chnaged task_unlocks to cell_unlocktree. Modified timer function for …
abouzied-nasar Nov 21, 2024
3e0c7b8
Remved unnecessary counter from engine_maketasks
abouzied-nasar Nov 21, 2024
fe5d945
Changed how s->nr_self_pack_tasks is incremented in scheduler.c
abouzied-nasar Nov 21, 2024
744c469
Commented out timing code in runner_main. The code ran for a full sim…
abouzied-nasar Nov 21, 2024
5b54b0f
Made changes to task dump for debugging to account for (skip) cell-le…
abouzied-nasar Dec 14, 2024
7a168d5
Added comments to remin myself to threadpool some bits in GPU task mg…
abouzied-nasar Dec 14, 2024
f7bf486
Added code to pass eta_neighbours (h/dx) to GPU code for calculating …
abouzied-nasar Dec 15, 2024
ff7087c
Removed conversion of sub-pairs to pairs, etc, in engine_maketasks fo…
abouzied-nasar Dec 16, 2024
ca9deaf
Added more debug checks to figure out why we got cells with 8x expect…
abouzied-nasar Dec 17, 2024
408b0e4
Hard-coded space_splitsize_default to 100 with no avail. Something fi…
abouzied-nasar Dec 17, 2024
fa1e7c4
Hard-coded space_grid_split_threshold_default to 100 and space_subsiz…
abouzied-nasar Dec 17, 2024
26585ec
I was barking up the wrong tree. The issue was that some particles wi…
abouzied-nasar Dec 18, 2024
4063e37
Added some comments
abouzied-nasar Dec 18, 2024
fd849fc
Okay. Now back to stealing. Code hangs when stealing enabled
abouzied-nasar Dec 18, 2024
fd9cfa0
Testing to see what happens when only self-dens tasks are done on GPU…
abouzied-nasar Dec 18, 2024
e9cb4dd
Reverted back to offloading only density tasks (self and pair)
abouzied-nasar Dec 19, 2024
e94fe6e
Weirdly code does not hang when using bigger test case and only densi…
abouzied-nasar Dec 19, 2024
0e3196d
Reverted to doing all tasks on GPU. Need to test on GHopper
abouzied-nasar Dec 19, 2024
3dfb81c
Deleted obsolete debug code
abouzied-nasar Dec 19, 2024
57cb5d9
Deleted obsolete debug code
abouzied-nasar Dec 19, 2024
02e18e4
Put in a weird fix in scheduler_gettask but I think it is wrong and s…
abouzied-nasar Dec 19, 2024
153558e
Deleted commented out obsolete code
abouzied-nasar Dec 19, 2024
f71201e
Changed condition to launch_leftovers to if leftover tasks <=1 instea…
abouzied-nasar Dec 19, 2024
707928b
Changed condition to launch_leftovers to if leftover tasks <=1 instea…
abouzied-nasar Dec 19, 2024
c5716b6
Extended <2 condition to other GPU task subtypes but the code now han…
abouzied-nasar Dec 19, 2024
b78ba84
Minor changes for testing. Commented out debug code for checking if t…
abouzied-nasar Dec 23, 2024
b533d16
Reverted condition for launch leftovers to n_packs_left < 1 instead o…
abouzied-nasar Dec 23, 2024
ceb4d6f
Allowed stealing engine_config. Changed debug if statement to error r…
abouzied-nasar Dec 30, 2024
4929cf8
Added simple debug code to scream when we have way more particles tha…
abouzied-nasar Dec 30, 2024
392e2ce
Added code to monitor how many GPU tasks a thread steals. Code still …
abouzied-nasar Dec 30, 2024
29e074b
Very weird behaviour found. Added some debug code set to crash run if…
abouzied-nasar Jan 6, 2025
949fd47
Moving tasks_left counter incrementation from scheduler.c to queue_in…
abouzied-nasar Jan 7, 2025
5abe90b
Changed if statements to atomic_CAS. Code hangs but the CAS does not …
abouzied-nasar Jan 7, 2025
5082ca9
Changed comp val to int in CAS instead of int *. Removed debug code. …
abouzied-nasar Jan 7, 2025
38953c4
Removed debug code which throws error if launching leftovers but stil…
abouzied-nasar Jan 7, 2025
c7e6a6c
Combined atomic dec and CAS calls into one operation. Code seems to r…
abouzied-nasar Jan 7, 2025
f3e3138
Removed un-necesseary mods from swift.c
abouzied-nasar Jan 8, 2025
a58d968
Added comment to explain changes in task.c
abouzied-nasar Jan 8, 2025
4513dca
Removed un-necessary code from space.h
abouzied-nasar Jan 8, 2025
cf16ae4
Reverted swift.c and space.h as eta_neighbours was indeed required fo…
abouzied-nasar Jan 8, 2025
7871c95
Reverted swift.c and space.h as eta_neighbours was indeed required fo…
abouzied-nasar Jan 8, 2025
3bf586e
Cleaned runner_main a bit whilst checking for source of new code hang…
abouzied-nasar Jan 8, 2025
091819d
Cleaned up un-necessary code from queue.c
abouzied-nasar Jan 8, 2025
ea17e54
Reverted back to signalling runners individually for each task which …
abouzied-nasar Jan 9, 2025
4ab67c2
Added additional switch in runner_main to double check whether to lau…
abouzied-nasar Jan 9, 2025
f8463e8
Code hanging. Unsure why. Made significant changes. Changed counters …
abouzied-nasar Jan 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Removed sub tasks from runner_main and moved making density sub tasks…
… into self and pairs to after creation of sub tasks for gradient and force
abouzied-nasar committed Nov 20, 2024
commit 2ee5742c7a47b013d5ca080e364d006ccb4c8a6e
2 changes: 1 addition & 1 deletion examples/HydroTests/GreshoVortex_3D/gresho.yml
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ InternalUnitSystem:
UnitTemp_in_cgs: 1 # Kelvin

Scheduler:
max_top_level_cells: 8
max_top_level_cells: 16
cell_split_size: 64
deadlock_waiting_time_s: 10.

25 changes: 16 additions & 9 deletions src/engine_maketasks.c
Original file line number Diff line number Diff line change
@@ -4857,15 +4857,6 @@ void engine_maketasks(struct engine *e) {
/* Split the tasks. */
scheduler_splittasks(sched, /*fof_tasks=*/0, e->verbose);

for (int i = 0; i < sched->nr_tasks; i++) {
struct task * t = &sched->tasks[i];
if(t->type == task_type_sub_self && t->subtype == task_subtype_gpu_pack){
t->type = task_type_self;
}
if(t->type == task_type_sub_pair && t->subtype == task_subtype_gpu_pack){
t->type = task_type_pair;
}
}
if (e->verbose)
message("Splitting tasks took %.3f %s.",
clocks_from_ticks(getticks() - tic2), clocks_getunit());
@@ -5036,6 +5027,15 @@ void engine_maketasks(struct engine *e) {
* sched->tasks, sched->nr_tasks, sizeof(struct task),
* threadpool_auto_chunk_size, e); */
}
for (int i = 0; i < sched->nr_tasks; i++) {
struct task * t = &sched->tasks[i];
if(t->type == task_type_sub_self && t->subtype == task_subtype_gpu_pack){
t->type = task_type_self;
}
if(t->type == task_type_sub_pair && t->subtype == task_subtype_gpu_pack){
t->type = task_type_pair;
}
}
for (int i = 0; i < sched->nr_tasks; i++) {
struct task * t = &sched->tasks[i];
if(t->type == task_type_sub_self && t->subtype == task_subtype_gpu_pack_g){
@@ -5389,6 +5389,13 @@ void engine_maketasks(struct engine *e) {
// t->subtype == task_subtype_gpu_unpack_g ||
// t->subtype == task_subtype_gpu_unpack_f){
// t->implicit = 1;
// }
// if ((t->subtype == task_subtype_gpu_pack ||
// t->subtype == task_subtype_gpu_pack_g ||
// t->subtype == task_subtype_gpu_pack_f) &&
// (t->type == task_type_sub_pair ||
// t->type == task_type_sub_self)){
// error("STill have subs");
// }
}

226 changes: 5 additions & 221 deletions src/runner_main_clean.cu
Original file line number Diff line number Diff line change
@@ -658,7 +658,8 @@ void *runner_main2(void *data) {
error("MPI_Comm_size failed with error %i.", res);
#endif
int count_max_parts_tmp =
2 * target_n_tasks * space->nr_parts * nr_nodes / space->nr_cells;
100 * target_n_tasks * space->nr_parts * nr_nodes / (32*32*32);//space->nr_cells;
message("max_parts %i\n", count_max_parts_tmp);
pack_vars_self_dens->count_max_parts = count_max_parts_tmp;
pack_vars_pair_dens->count_max_parts = count_max_parts_tmp;
pack_vars_self_forc->count_max_parts = count_max_parts_tmp;
@@ -1571,131 +1572,16 @@ void *runner_main2(void *data) {
(t1.tv_sec - t0.tv_sec) +
(t1.tv_nsec - t0.tv_nsec) / 1000000000.0;
/* GPU WORK */
} else if (t->subtype == task_subtype_gpu_pack) {
packed_self++;
#ifdef GPUOFFLOAD_DENSITY
// struct timespec t0, t1; //
// clock_gettime(CLOCK_REALTIME, &t0);
ticks tic_cpu_pack = getticks();
message("Did a sub_self density");

packing_time +=
runner_doself1_pack_f4(r, sched, pack_vars_self_dens, ci, t,
parts_aos_f4_send, task_first_part_f4);

t->total_cpu_pack_ticks += getticks() - tic_cpu_pack;

// clock_gettime(CLOCK_REALTIME, &t1);
// packing_time += (t1.tv_sec - t0.tv_sec) +
// (t1.tv_nsec - t0.tv_nsec) /
// 1000000000.0;
// runner_doself1_pack(r, sched, pack_vars_self_dens, ci,
// t, parts_aos_dens, &packing_time);
/* No pack tasks left in queue, flag that we want to run */
int launch_leftovers = pack_vars_self_dens->launch_leftovers;
/*Packed enough tasks let's go*/
int launch = pack_vars_self_dens->launch;
/* Do we have enough stuff to run the GPU ? */
if (launch) n_full_d_bundles++;
if (launch_leftovers) n_partial_d_bundles++;
if (launch || launch_leftovers) {
/*Launch GPU tasks*/
signal_sleeping_runners(sched, t, pack_vars_self_dens->tasks_packed);
runner_doself1_launch_f4(
r, sched, pack_vars_self_dens, ci, t, parts_aos_f4_send,
parts_aos_f4_recv, d_parts_aos_f4_send, d_parts_aos_f4_recv,
stream, d_a, d_H, e, &packing_time, &time_for_density_gpu,
&unpack_time_self, task_first_part_self_dens_f4, devId,
task_first_part_f4, d_task_first_part_f4, self_end);
// runner_doself1_launch(r, sched,
// pack_vars_self_dens, ci, t, parts_aos_dens,
// d_parts_aos_dens, stream, d_a, d_H, e, &packing_time,
// &time_for_density_gpu,
// &tot_time_for_hard_memcpys);
} /*End of GPU work Self*/
#endif //GPUOFFLOAD_DENSITY
} /* self / pack */
}
#ifdef EXTRA_HYDRO_LOOP
else if (t->subtype == task_subtype_gradient) {
runner_dosub_self1_gradient(r, ci, 1);
// fprintf(stderr, "split a g task\n");
}
else if (t->subtype == task_subtype_gpu_pack_g) {
#ifdef GPUOFFLOAD_GRADIENT
// runner_doself1_pack_g(r, sched, pack_vars_self_grad, ci,
// t, parts_aos_grad, &packing_time_g);
ticks tic_cpu_pack = getticks();

packing_time_g += runner_doself1_pack_f4_g(
r, sched, pack_vars_self_grad, ci, t, parts_aos_grad_f4_send,
task_first_part_f4_g);

t->total_cpu_pack_ticks += getticks() - tic_cpu_pack;

/* No pack tasks left in queue, flag that we want to run */
int launch_leftovers = pack_vars_self_grad->launch_leftovers;
/*Packed enough tasks let's go*/
int launch = pack_vars_self_grad->launch;
/* Do we have enough stuff to run the GPU ? */
if (launch || launch_leftovers) {
/*Launch GPU tasks*/
// runner_doself1_launch_g(r, sched,
// pack_vars_self_grad, ci, t, parts_aos_grad,
// d_parts_aos_grad, stream, d_a,
// d_H, e, &packing_time_g, &time_for_gpu_g);
signal_sleeping_runners(sched, t, pack_vars_self_grad->tasks_packed);
runner_doself1_launch_f4_g(
r, sched, pack_vars_self_grad, ci, t, parts_aos_grad_f4_send,
parts_aos_grad_f4_recv, d_parts_aos_grad_f4_send,
d_parts_aos_grad_f4_recv, stream, d_a, d_H, e,
&packing_time_g, &time_for_gpu_g, task_first_part_f4_g,
d_task_first_part_f4_g, self_end_g, &unpack_time_self_g);
} /*End of GPU work Self*/
#endif //GPUOFFLOAD_GRADIENT
}
#endif
else if (t->subtype == task_subtype_force) {
runner_dosub_self2_force(r, ci, 1);
// fprintf(stderr, "split a f task\n");
} else if (t->subtype == task_subtype_gpu_pack_f) {
#ifdef GPUOFFLOAD_FORCE
// runner_doself1_pack_f(r, sched, pack_vars_self_forc, ci,
// t, parts_aos_forc, &packing_time_f);
ticks tic_cpu_pack = getticks();

packing_time_f += runner_doself1_pack_f4_f(
r, sched, pack_vars_self_forc, ci, t, parts_aos_forc_f4_send,
task_first_part_f4_f);

t->total_cpu_pack_ticks += getticks() - tic_cpu_pack;

// int count = ci->hydro.count;
// for(int i = 0; i < count; i++){
// int pid = pack_vars_self_forc->count_parts - count +
// i; if(parts_aos_forc_f4_send[pid].ux_m.w <
// 1e-9)fprintf(stderr, "zero mass after packing %i %f\n",
// pid, parts_aos_forc_f4_send[pid].ux_m.w);
// }
/* No pack tasks left in queue, flag that we want to run */
int launch_leftovers = pack_vars_self_forc->launch_leftovers;
/*Packed enough tasks let's go*/
int launch = pack_vars_self_forc->launch;
/* Do we have enough stuff to run the GPU ? */
if (launch || launch_leftovers) {
/*Launch GPU tasks*/
// runner_doself1_launch_f(r, sched,
// pack_vars_self_forc, ci, t, parts_aos_forc,
// d_parts_aos_forc, stream, d_a, d_H, e, &packing_time_f,
// &time_for_gpu_f);
signal_sleeping_runners(sched, t, pack_vars_self_forc->tasks_packed);
runner_doself1_launch_f4_f(
r, sched, pack_vars_self_forc, ci, t, parts_aos_forc_f4_send,
parts_aos_forc_f4_recv, d_parts_aos_forc_f4_send,
d_parts_aos_forc_f4_recv, stream, d_a, d_H, e,
&packing_time_f, &time_for_gpu_f, task_first_part_f4_f,
d_task_first_part_f4_f, self_end_f, &unpack_time_self_f);
} /*End of GPU work Self*/
#endif //GPUOFFLOAD_FORCE
}
else if (t->subtype == task_subtype_limiter)
runner_dosub_self1_limiter(r, ci, 1);
@@ -1740,119 +1626,17 @@ void *runner_main2(void *data) {
// message("Doing a pair sub task");
runner_dosub_pair1_density(r, ci, cj, 1);
}
else if (t->subtype == task_subtype_gpu_pack) {
#ifdef GPUOFFLOAD_DENSITY
ticks tic_cpu_pack = getticks();

message("Did a sub_pair density");
packing_time_pair += runner_dopair1_pack_f4(
r, sched, pack_vars_pair_dens, ci, cj, t,
parts_aos_pair_f4_send, e, fparti_fpartj_lparti_lpartj_dens);

t->total_cpu_pack_ticks += getticks() - tic_cpu_pack;
/* Packed enough tasks or no pack tasks left in queue, flag that
* we want to run */
int launch = pack_vars_pair_dens->launch;
int launch_leftovers = pack_vars_pair_dens->launch_leftovers;

/* Do we have enough stuff to run the GPU ? */
if (launch) n_full_p_d_bundles++;
if (launch_leftovers) n_partial_p_d_bundles++;
if (launch || launch_leftovers) {

/*Launch GPU tasks*/
// runner_dopair1_launch(r, sched,
// pack_vars_pair_dens, ci, t, parts_aos_pair_dens,
// d_parts_aos_pair_dens,
// stream, d_a, d_H, e, &packing_time_pair,
//&time_for_density_gpu_pair);
signal_sleeping_runners(sched, t, pack_vars_pair_dens->tasks_packed);
runner_dopair1_launch_f4_one_memcpy(
r, sched, pack_vars_pair_dens, t, parts_aos_pair_f4_send,
parts_aos_pair_f4_recv, d_parts_aos_pair_f4_send,
d_parts_aos_pair_f4_recv, stream_pairs, d_a, d_H, e,
&packing_time_pair, &time_for_density_gpu_pair,
&unpacking_time_pair, fparti_fpartj_lparti_lpartj_dens,
pair_end);
}
#endif
}
#ifdef EXTRA_HYDRO_LOOP
else if (t->subtype == task_subtype_gradient) {
runner_dosub_pair1_gradient(r, ci, cj, 1);
// fprintf(stderr, "split a g task\n");
} else if (t->subtype == task_subtype_gpu_pack_g) {
#ifdef GPUOFFLOAD_GRADIENT
ticks tic_cpu_pack = getticks();

packing_time_pair_g +=
runner_dopair1_pack_f4_g(r, sched, pack_vars_pair_grad, ci,
cj, t, parts_aos_pair_f4_g_send, e,
fparti_fpartj_lparti_lpartj_grad);

t->total_cpu_pack_ticks += getticks() - tic_cpu_pack;

/* No pack tasks left in queue, flag that we want to run */
int launch_leftovers = pack_vars_pair_grad->launch_leftovers;
/*Packed enough tasks let's go*/
int launch = pack_vars_pair_grad->launch;
/* Do we have enough stuff to run the GPU ? */
if (launch || launch_leftovers) {
/*Launch GPU tasks*/
// runner_dopair1_launch_g(r, sched,
// pack_vars_pair_grad, ci, t, parts_aos_pair_grad,
// d_parts_aos_pair_grad,
// stream, d_a, d_H, e, &packing_time_pair_g,
//&time_for_gpu_pair_g);
signal_sleeping_runners(sched, t, pack_vars_pair_grad->tasks_packed);
runner_dopair1_launch_f4_g_one_memcpy(
r, sched, pack_vars_pair_grad, t, parts_aos_pair_f4_g_send,
parts_aos_pair_f4_g_recv, d_parts_aos_pair_f4_g_send,
d_parts_aos_pair_f4_g_recv, stream_pairs, d_a, d_H, e,
&packing_time_pair_g, &time_for_gpu_pair_g,
&unpacking_time_pair_g, fparti_fpartj_lparti_lpartj_grad,
pair_end_g);
}
#endif
}
#endif
else if (t->subtype == task_subtype_force) {
runner_dosub_pair2_force(r, ci, cj, 1);
// fprintf(stderr, "split a f task\n");
} else if (t->subtype == task_subtype_gpu_pack_f) {
#ifdef GPUOFFLOAD_FORCE
ticks tic_cpu_pack = getticks();

packing_time_pair_f +=
runner_dopair1_pack_f4_f(r, sched, pack_vars_pair_forc, ci,
cj, t, parts_aos_pair_f4_f_send, e,
fparti_fpartj_lparti_lpartj_forc);

t->total_cpu_pack_ticks += getticks() - tic_cpu_pack;

/* No pack tasks left in queue, flag that we want to run */
int launch_leftovers = pack_vars_pair_forc->launch_leftovers;
/*Packed enough tasks let's go*/
int launch = pack_vars_pair_forc->launch;
/* Do we have enough stuff to run the GPU ? */
if (launch || launch_leftovers) {
/*Launch GPU tasks*/
// runner_dopair1_launch_f(r, sched,
// pack_vars_pair_forc, ci, t, parts_aos_pair_forc,
// d_parts_aos_pair_forc,
// stream, d_a, d_H, e, &packing_time_pair_f,
// &time_for_gpu_pair_f);
signal_sleeping_runners(sched, t, pack_vars_pair_forc->tasks_packed);
runner_dopair1_launch_f4_f_one_memcpy(
r, sched, pack_vars_pair_forc, t, parts_aos_pair_f4_f_send,
parts_aos_pair_f4_f_recv, d_parts_aos_pair_f4_f_send,
d_parts_aos_pair_f4_f_recv, stream_pairs, d_a, d_H, e,
&packing_time_pair_f, &time_for_gpu_pair_f,
&unpacking_time_pair_f, fparti_fpartj_lparti_lpartj_forc,
pair_end_f);
} /* End of GPU work Pairs */
#endif
} else if (t->subtype == task_subtype_limiter)
}
else if (t->subtype == task_subtype_limiter)
runner_dosub_pair1_limiter(r, ci, cj, 1);
else if (t->subtype == task_subtype_stars_density)
runner_dosub_pair_stars_density(r, ci, cj, 1);