Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laser envelope solver (finally!) #743

Merged
merged 81 commits into from
Oct 28, 2022
Merged

Conversation

MaxThevenet
Copy link
Member

@MaxThevenet MaxThevenet commented May 23, 2022

This PR proposes an implementation of a laser envelope solver. The plasma response to an analytic laser pulse was already implemented. Now, the propagation of the laser pulse in a plasma is also included. The model is based on Benedetti's 2017/2018 article.

Features

  • The complex envelope of current and previous time steps is stored as a 3D array (currently with same bounds and resolution).
  • The 3D array can be stored on host or on device memory, as chosen by a runtime parameter. The copies from/to the 3D array is implemented.
  • Runs on Nvidia GPUs (and CPUs).
  • Runs in parallel.
  • The envelope is dumped in openPMD files.
  • A CI test checks the evolution of a laser pulse in vacuum (and compares with theory).

Structure

The 3D array is Laser::m_F; 2D slices (FArrayBoxes) are stored in Laser::m_slices, labelled n**j** where n stands for the time step, j for the longitudinal slice. n00 is time step n, nm1 is n-1 and np1 is n+1. A similar notation is used for slice j. In general, fields are stored with 2 Real components, one for the real part, the other for the imaginary part, except inside the FFT solver (where fields are directly stored as Complex arrays).

A new Poisson solver is implemented to solve the C2C Poisson equation with periodic BC. Although this causes duplication with the existing solver, it reduces complexity (the capability to abstract the type of the source array of the forward FFT to be either Real, as needed everywhere, or Complex, as needed for the laser pulse, would take significant templating and obfuscate the code). This could be reconsidered. The solver is directly implemented in Laser.cpp in Laser::AdvanceSliceFFT, with the required abstraction for portability in Laser.H.

Two solvers are implemented, a MG solver and a FFT solver, to advance a laser slice by 1 time step. This solver operation computes slice j at step n+1 using previous slices and previous time steps: $s_{j}^{n+1} = f(s_{j+1,j+2}^{n+1}, s_{j,j+1,j+2}^{n}, s_{j,j+1,j+2}^{n+1})$. This is integrated within the loop over slices. The management of these slices is largely done in Laser::Copy.

For parallel runs, the whole 3D array on the current box has to be communicated. This is done similar to beam communication.

A new quantity chi has to be deposited (in PlasmaCurrentDepositionInner.H) for the plasma response. This is essentially the plasma density divided by the Lorentz factor. It is used in the laser solver.

Performance

The CI test with resolution 1024 1024 500 for 7 time steps gives the following time (in vacuum, so the changes would not be that dramatic with a plasma):

$ grep "total time" output*txt
output.old_ranks.1_host.0.txt:TinyProfiler total time across processes [min...avg...max]: 17.49 ... 17.49 ... 17.49
output.old_ranks.1_host.1.txt:TinyProfiler total time across processes [min...avg...max]: 35.19 ... 35.19 ... 35.19
output.old_ranks.4_host.0.txt:TinyProfiler total time across processes [min...avg...max]: 20.34 ... 21.88 ... 23.19
output.old_ranks.4_host.1.txt:TinyProfiler total time across processes [min...avg...max]: 26.69 ... 28.66 ... 30.36

and memory usage

$ grep "Free  GPU global memory" output*txt
output.old_ranks.1_host.0.txt:Free  GPU global memory (MB) spread across MPI: [23391 ... 23391]
output.old_ranks.1_host.1.txt:Free  GPU global memory (MB) spread across MPI: [39361 ... 39361]
output.old_ranks.4_host.0.txt:Free  GPU global memory (MB) spread across MPI: [35265 ... 35265]
output.old_ranks.4_host.1.txt:Free  GPU global memory (MB) spread across MPI: [39225 ... 39353]

when changing the number of ranks and whether the laser envelope is on host or device. As expected, storing the 3D laser envelope on host makes the code slower, but uses less memory.

Remains to be done

See #804.

@MaxThevenet MaxThevenet added GPU Related to GPU acceleration Parallelization Longitudinal and transverse MPI decomposition pipeline Specific to the implementation of the new pipeline component: laser envelope About the laser envelope solver labels May 23, 2022
@MaxThevenet MaxThevenet changed the title [WIP] Laser envelope solver Laser envelope solver (finally!) Oct 24, 2022
Copy link
Member

@SeverinDiederichs SeverinDiederichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! See a couple of comments below.

Further problems discussed offline.

A few more items to be added to the to-do list:

  1. Initialization of a laser profile via the parser (non-Gaussian),
  2. the possibility to propagate the laser backwards,
  3. the possibility to load and restart the laser.

docs/source/run/parameters.rst Outdated Show resolved Hide resolved
examples/laser/inputs_SI Show resolved Hide resolved
examples/laser/inputs_SI Outdated Show resolved Hide resolved
examples/laser/inputs_SI Outdated Show resolved Hide resolved
examples/laser/inputs_SI Outdated Show resolved Hide resolved
src/laser/Laser.cpp Outdated Show resolved Hide resolved
src/particles/pusher/FieldGather.H Show resolved Hide resolved
tests/laser_blowout_wake_explicit.1Rank.sh Show resolved Hide resolved
tests/laser_blowout_wake_explicit.SI.1Rank.sh Show resolved Hide resolved
src/utils/AdaptiveTimeStep.H Show resolved Hide resolved
src/laser/Laser.H Outdated Show resolved Hide resolved
@MaxThevenet
Copy link
Member Author

Alright, thanks for all the comments! I believe all are either solved or listed in the todo above.

@SeverinDiederichs
Copy link
Member

Could you please comment on the status of point 1 on the to do list? Was this resolved at least for serial runs now? Or is this still present?

Otherwise, I think we can merge soon and move the to do list to an issue.

@MaxThevenet
Copy link
Member Author

MaxThevenet commented Oct 28, 2022

Point 1 above is still relevant. This is not a surprise: the Notify/Wait code full is written for 3d_on_host = 0, and makes no sense for 3d_on_host = 1. I'll try to dedicate some time today to fix it. If not, we can merge and take care of it in a subsequent PR.
For serial runs, it is fixed indeed with an exit condition to the wait/notify functions.

@MaxThevenet
Copy link
Member Author

Option 3d_on_host could be accelerated, but it behaves as expected (it does reduce a lot the memory footprint). I added some info on that in the PR description. Therefore, I think this PR is good to go. I split point 1 in the todo list into 2 more detailed points. I also updated the doc to mention that the MG solver is currently less stable (offline chat with @SeverinDiederichs).

Copy link
Member

@SeverinDiederichs SeverinDiederichs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Awesome!
Let's merge this now and move the to do list to an issue and work on it on separate PRs 🚀

@MaxThevenet MaxThevenet merged commit 0741836 into Hi-PACE:development Oct 28, 2022
@MaxThevenet MaxThevenet deleted the laser branch October 28, 2022 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: laser envelope About the laser envelope solver GPU Related to GPU acceleration Parallelization Longitudinal and transverse MPI decomposition pipeline Specific to the implementation of the new pipeline
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants