diff --git a/sphinx/source/ConcurrentExample.png b/sphinx/source/ConcurrentExample.png new file mode 100644 index 0000000..faf4eba Binary files /dev/null and b/sphinx/source/ConcurrentExample.png differ diff --git a/sphinx/source/CpuConfig.png b/sphinx/source/CpuConfig.png new file mode 100644 index 0000000..dfbcfa7 Binary files /dev/null and b/sphinx/source/CpuConfig.png differ diff --git a/sphinx/source/CpuGpuDataParallel.png b/sphinx/source/CpuGpuDataParallel.png new file mode 100644 index 0000000..fb42890 Binary files /dev/null and b/sphinx/source/CpuGpuDataParallel.png differ diff --git a/sphinx/source/GpuConfig.png b/sphinx/source/GpuConfig.png new file mode 100644 index 0000000..a3077eb Binary files /dev/null and b/sphinx/source/GpuConfig.png differ diff --git a/sphinx/source/HybridDist.png b/sphinx/source/HybridDist.png new file mode 100644 index 0000000..8d0c0f1 Binary files /dev/null and b/sphinx/source/HybridDist.png differ diff --git a/sphinx/source/key-macro.png b/sphinx/source/key-macro.png deleted file mode 100644 index a00eb7c..0000000 Binary files a/sphinx/source/key-macro.png and /dev/null differ diff --git a/sphinx/source/macroprocessor.rst b/sphinx/source/macroprocessor.rst index 2e9d3d3..36b0c29 100644 --- a/sphinx/source/macroprocessor.rst +++ b/sphinx/source/macroprocessor.rst @@ -105,5 +105,194 @@ present it any of its own ".ini" files. As with files, the definitions placed in the Simulation directory of the application override all definitions encountered earlier. +We have defined several macros for convenience that are available in +the file "macro_processors.ini" in the *bin* directory. The more +commonly used ones are described below. +.. container:: codeseg + + [loop_3d] + + args=limits + + definition = + do k=limits(LOW,KAXIS),limits(HIGH,KAXIS) + do j=limits(LOW,JAXIS),limits(HIGH,JAXIS) + do i=limits(LOW,IAXIS),limits(HIGH,IAXIS) + +Use as @M loop_3d(blkLimits) results in the loop bounds for a triply +nested loop in place of the macro name where every occurence of +"limits" is replaced with "blkLimits" + + +.. container:: codeseg + + [bounds_3d] + + args=limits + + definition = + limits(LOW,IAXIS):limits(HIGH,IAXIS),& + + limits(LOW,JAXIS):limits(HIGH,JAXIS),& + + limits(LOW,KAXIS):limits(HIGH,KAXIS) + + usage in declaration as + + real, dimension(@M bounds_3d(blkLimits)) :: arr + +will result in a 3D array being declared with bounds defined using the +supplied two dimensional array blkLimits. + + +.. container:: codeseg + + [bounds_2d] + + args=x1,x2,limits + + definition = + + limits(LOW,x1AXIS):limits(HIGH,x1AXIS),& + + limits(LOW,x2AXIS):limits(HIGH,x2AXIS) + +This macros is used for declaring 2D arrays when bounds are included +in the supplied array to replace *limits*. This one has an additional +feature, x1 and x2 can be "I", "J", or "K" to define which two +directions are included in the array + + +.. container:: codeseg + + [tileDesc_get] + + args =lim1,lim2,lim3,del + + definition = + + lim1(:,:)=tileDesc%%limits + + lim2(:,:)=tileDesc%%blkLimitsGC + + lim3(:,:)=tileDesc%%grownLimits + + call tileDesc%%deltas(del) + + level=tileDesc%%level + +@M tileDesc_get(blkLimits,blkLimitsGC,grownLimits,deltas) fills the +supplied arrays with the corresponding tile data. It assumes that all +these variables have been declared in the code. The argument lim1 has +bounds for the interior cells of the block where the solution is to be +advanced and lim2 has bounds for all cells of the block including the +guardcells. The third argument lim3 is needed for tiling, that is if a +block is subdivided into tiles, the lim3 has bounds for the section of +the block that is the part of the tile and also includes those +interior cells that effectively become the guardcells for the cells +that are to be advanced in this tile. The final argument is a real 1D +array of size 3 in which deltax, deltay and delaz are returned. Note +that the function returns valid values for 1:NDIM dimensions +only. Also note that this macro fetches the value of the "level" +assuming that it has been declared as is in the declaration section of +the code. For convenience one +can use the next macro, "tileDesc_declare" in the declaration section +of the code to ensure that the variables are appropriately declared. Note +that the first three arguments given to the two macros must be +identical and identically ordered for correct behavior. + + +.. container:: codeseg + + [tileDesc_declare] + + args = lim1,lim2,lim3 + + definition = + + integer :: level + + integer, dimension(LOW:HIGH,MDIM) :: lim1,lim2,lim3 + + real,dimension(MDIM) :: deltas + +There are additional tile related macros that can be put in the "use" +and declaration sections of the code, as well as used as arguments. It +is not necessary to use any of these macros, however, users are +strongly encouraged to use them wherever needed. If we need to change +the tile class for any reason, it would be straightforward to make the +code compatible everywhere by just making the change to the macro +definition instead of writing scripts to do search and replace. + +The next few macros pertain to the use of iterators in the code. As +with tiles there is one for the "use" section and one for the "declare" +section. For starting the iterator two different macros are provided; +one compatible with |amrex|'s prefered mode of operating on a level by +level basis, and the other one compatible with |paramesh|'s preference +of operating on all levels in the same loop. The arguments for the two + + +.. container:: codeseg + + [iter_all_begin] + + args=x1,t1,lim1,lim2,del + + definition = + + call Grid_getTileIterator(itor, x1, tiling=t1) + + do while(itor%%isValid()) + + call itor%%currentTile(tileDesc) + + @M tileDesc_get(lim1,lim2,grownLimits,del) + + call tileDesc%%getDataPtr(Uin, CENTER) + +This is the macro for |paramesh| preferred iterators where x1 is the +argument for the blocktype (usually LEAF} and t1 is either .true. if +tiing is desired, and .false. if it is not. The arguments lim1 and +lim2 are the usual blkLimits and blkLimitsGC. Note that the macros is +assuming that grownLimits and Uin are declared as expected, they are +not among the arguments. The next macro has an additional argument l1, +where the value of level resides. + + +.. container:: codeseg + + [iter_level_begin] + + args=x1,t1,l1,lim1,lim2,del + + definition= + + call Grid_getTileIterator(itor,x1,level=l1,tiling=t1) + + do while(itor%%isValid()) + + call itor%%currentTile(tileDesc) + + @M tileDesc_get(lim1,lim2,grownLimits,del) + + call tileDesc%%getDataPtr(Uin, CENTER) + + +The next macro is to be used at the end of the iterator loop. It +release the pointer Uin, and also the tile iterator. + + + .. container:: codeseg + + [iter_end] + + definition = + call tileDesc%%releaseDataPtr(Uin,CENTER) + + call itor%%next() + + end do !!block loop + + call Grid_releaseTileIterator(itor) diff --git a/sphinx/source/milhoja.rst b/sphinx/source/milhoja.rst index 7ae236f..4d619f7 100644 --- a/sphinx/source/milhoja.rst +++ b/sphinx/source/milhoja.rst @@ -116,3 +116,61 @@ on any device without being aware of the specifics of the device so long as the required data is resident in the appropriate memory system. +.. _`Sec:examples`: + +Runtime Examples +------------------- + +Below are the examples of possible thread team configurations in +increasing order of complexity. + +.. container:: center + + .. figure:: CpuConfig.png + :alt: cpuconfig + :name: Fig:cpuconfig + :width: 3.0in + + +The figure above shows a configuration where computation is being done +only on the CPU, while the next figure shows computation only on the +GPU. Note that there are addiotional steps of data packing and +unpacking and the data is moving back and forth between the host and +the GPU. + + +.. container:: center + + .. figure:: GpuConfig.png + :alt: gpuconfig + :name: Fig:gpuconfig + :width: 3.5in + +The next figure shows a configuration with the next level of +complexity where both CPU and GPU are applied to the same task. Two +teams are in operation, the CPU team is given 3 thread and the GPU +team is given 4 threads. These threads are used only for moving data, +not for computation. + +.. container:: center + + .. figure:: CpuGpuDataParallel.png + :alt: cpugpuparallel + :name: Fig:cpugpuparallel + :width: 4.0in + + +The final figure shows an example of how we envision |milhoja| being +used. Here concurrent computations are proceeding on the two devices +but they are allotted different tasks at first. The data from GPU is +sent back to the CPU once its computation is done, and yet another +task is performed on the CPU. + + +.. container:: center + + .. figure:: ConcurrentExample.png + :alt: concurrentl + :name: Fig:concurrent + :width: 5.0in +>>>>>>> main diff --git a/sphinx/source/saved_text.txt b/sphinx/source/saved_text.txt new file mode 100644 index 0000000..98809e0 --- /dev/null +++ b/sphinx/source/saved_text.txt @@ -0,0 +1,26 @@ +.. container:: center + + :: + + SPECIES HE4 + SPECIES O16 + +The properties of the gases are initialized in the file +``Simulation/Simulation_initSpecies``\ ``.F90``, for example + +.. container:: center + + :: + + subroutine Simulation_initSpecies() + use Multispecies_interface, ONLY : Multispecies_setProperty + implicit none + #include "Simulation.h" + #include "Multispecies.h" + call Multispecies_setProperty(HE4_SPEC, A, 4.) + call Multispecies_setProperty(HE4_SPEC, Z, 2.) + call Multispecies_setProperty(HE4_SPEC, GAMMA, 1.66666666667e0) + call Multispecies_setProperty(O16_SPEC, A, 16.0) + call Multispecies_setProperty(O16_SPEC, Z, 8.0) + call Multispecies_setProperty(O16_SPEC, GAMMA, 1.4) + end subroutine Simulation_initSpecies diff --git a/sphinx/source/softwaresystem.rst b/sphinx/source/softwaresystem.rst deleted file mode 100644 index 4b6ab51..0000000 --- a/sphinx/source/softwaresystem.rst +++ /dev/null @@ -1,15 +0,0 @@ -.. include:: defs.h - -The Configuration Toolchain -=========================== - -This section covers the structure of |flashx| configuration toolchain. - - -.. toctree:: - :maxdepth: 1 - :caption: Contents - - setup - macroprocessor -