-
Notifications
You must be signed in to change notification settings - Fork 23
direct numerical simulation
License
mt5555/dns
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Copyright 2007. Los Alamos National Security, LLC. This material was produced under U.S. Government contract DE-AC52-06NA25396 for Los Alamos National Laboratory (LANL), which is operated by Los Alamos National Security, LLC for the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this software. NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY, LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY FOR THE USE OF THIS SOFTWARE. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL. Additionally, this program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. Accordingly, this program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. =================================================================== Sandia/LANL DNS code Mark Taylor mataylo@sandia.gov =================================================================== README: updated 3/19/2013: added 'pseudospectral.pdf' to svn repo README: updated 2/20/2009: Added support for the SDSC P3DFFT parallel FFT library. P3DFFT is up to 2x faster than the our internal transpose + FFTW code. To use P3DFFT, build the "dnsp3" model. This obsoletes are previous "dnsp" optimized model. README: updated 2/5/2009: Added new optimizations to dnsp code to skip on processor transpose_to/from_y operations. To enable, padding must be "0 2 0" and ref decomp must be y-pencil. The vorticity3 trick uses some ffts with this optimization and some ffts without, so one should benchmark with and without "-nov3" To verify that this optimization is enabled, the code will print this message to stdout: "Ref decomp is y-pencil decomp: skipping transpose_to_y calls" README: updated 9/26/2008: Updated to note that for the pure x-y slab decomposition for a problem of size N^3, NCPUS can be as large as N. (we used to have to switch to a pencil decomposition for NCPUS>N/2 ) README: updated 7/22/2008: minor edits README: updated 1/20/2008: We now support the exact dealiasing based on phase shifting + spherical truncation. To measure/test dealiasing error, see testing/dealias.sh. README: updated 8/16/2007: For pencil decompositions, we now (8/15/2006) have a more efficient model "dnsp" that should be used instead of "dns" (but dnsp does not yet allow for passive tracers) To compile, type "make dnsp" instead of "make dns", and the executable will be named "dnsp". =================================================================== This directory and its subdirectories contain the Sandia/LANL DNS code. This code solves viscous fluid dynamics equations in a periodic rectangular domain (2D or 3D) with a pseudo-spectral method or 4th order finite differences and with the standard RK4 time stepping scheme. The code is documented in [1] Taylor, Kurien and Eyink, Phys. Rev. E 68, 2003. [2] Kurien and Taylor, Los Alamos Science 29, 2005. [3] 'pseudospectral.pdf' included with the source code The code has options to solve the following equations: 1. Navier-Stokes (primitive variables) 2D and 3D 2. Navier-Stokes (stream function/vorticity) 2D only 2. Lagrangian Averaged Navier Stokes (The Alpha Model) 2D and 3D 3. Shallow water and Shallow water alpha 2D only 4. Boussinesque (with stratification) 3D only In can optionally allow for rotation, arbitrary aspect ratio and an arbitrary number of passive scalars. It is an MPI code and can use a 3D domain decomposition, (allowing for slab decomposition, pencil decomposition or cube decomposition). For a grid of size N^3, it can run on up to N^3/8 processors. It has been run on as many as 18432 processors (with grids up to 4096^3) This README file contains instructions to compile and run the purest form of the code: Navier-Stokes with deterministic low waver number forcing, use RK4 and a pseudo-spectral method. The equations are solved in a triply periodic cube of side length 1 with no passive scalars or rotation. The deterministic low wave number forcing is a simplified version of Overholt and Pope, Comput. Fluids 27 1998, and is documented in detail in [1]. Some more, but incomplete documentation for this code are in the files: pseudospectral.tex and dns.doc Documentation for running the Boussinesq version is in rotation_doc/rotation.tex. The steps required to compile and run the code are: 0. Choose the resolution 1. set up the grid dimensions 2. compile the code 3. edit the input file, forcing12.inp or forcing12-bench.inp 4. run 5. validate the results 6. analysis of output Step 0. Choose the resolution and resolution requirements. The parameters set by the user are the grid resolution and viscosity coefficieint. The forcing we use is such that epsilon=3.58 and KE=1.89, and so the eddy turnover time is 1.05. (These values are from N=1024^3. At other values of N, they may change slightly) But if we assume these values of KE and epsilon, we can then determine the correct viscosity to use for a given N: grid of size: N^3 resolution constraint: G = eta*kmax Suggested value of G for this problem (forced low wave number turbulence) G=1.0. (For improved resolution in the viscous regime, some people use the more restrictive G=1.5) The code supports several types of dealasing: 1. partial spherical (the most efficient, not fully dealiased) 2. 2/3 rule (the most efficient in terms of retained modes) 3. phase shifting (the most efficient in terms of maximum spherical wave number) with phase shifting or partial spherical, kmax = 2*pi*N*sqrt(2)/3 with 2/3 dealising: kmax = 2*pi*N/3 (the 2pi shows up in the wave number because our box is of side length 1). Given the choice of G and N, we can then determine the viscosity. Using that eta = (mu^3/epsilon)^.25, we have: mu = epsilon**(1/3) * [G/kmax]**4/3 The resulting Taylor Reynolds number: Rl = KE sqrt(20/(3*mu*epsilon)) Example 1 1024^3, with 2/3 dealiasing and G=1. The viscosity coefficient used should be: mu = 5.5e-5. The expected Rl = 347 Example 2: 1024^3, with spherical dealiasing and G=1. The viscosity coefficient used should be: mu = 3.5e-5 The expected Rl = 437 Example 3: 64^3, with spherical dealiasing and G=1. The viscosity coefficient used should be: mu = .0014 The expected Rl = 69 Step 1. Set up the grid dimensions. I use a python script, 'gridsetup.py' which will create the needed 'params.h' file. To run an N^3 problem, with a domain decompostion of nx*ny*nz (so NCPUS=nx*ny*nz): ./gridsetup.py nx ny nz N N N 0 0 2 The 0,0,2 specifies how the arrays are padded in x,y,z directions. 0,0,2 is required when using P3DFFT. When using the internal parallel fft, 2,0,0 is optimal. P3DFFT requires that nx=1. Here are some examples: For P3DFFT, which only supports slab and pencil decompositions: A 32x32x32 grid, to run using just 1 cpu: % cd dns/src % ./gridsetup.py 1 1 1 32 32 32 0 0 2 A 64x64x64 grid, on 4 cpu: (parallel decomposition: 1x1x4, so 4 hyperslabs in the z-direction) % cd dns/src % ./gridsetup.py 1 1 4 64 64 64 0 0 2 A 64x64x64 grid, on 64 cpu: (parallel decomposition: 2x1x32, so a pensil decomposition) % cd dns/src % ./gridsetup.py 1 2 32 64 64 64 0 0 2 The DNS code's internal parallel FFT supports cube decompositions and also supports slab decompositions up to size N: % ./gridsetup.py 1 1 4 64 64 64 2 0 0 % ./gridsetup.py 2 2 4 64 64 64 2 0 0 % ./gridsetup.py 1 1 64 64 64 64 2 0 0 For a grid N^3, N must be a power of 2,3,5, and N/nx, N/ny and N/nz must be an even integers (with one exception for the case nx=1, ny=1, nz=N, which is allowed). There are some other restrictions because of how we wrote our transpose routines. gridsetup.py will issue warnings if they are violated, and, if the code is run it will print error messages and stop. If you do not have python installed on your system, you can instead copy the file 'params.h.test' to params.h' and then edit params.h by hand. By default, it is set up to run a 32^3 simulation on 1 cpu. NOTE: The most efficient configuration for large processor counts will be with ny=1 and nx<nz. For example, to run a 4096^3 simulation on 8192 processors, I would modify the instructions below to: ./gridsetup.py 1 4 2048 4096 4096 4096 0 0 2 Other possibilities which might be more efficient: (we need a performance model :-) ./gridsetup.py 1 8 1024 4096 4096 4096 0 0 2 ./gridsetup.py 1 16 512 4096 4096 4096 0 0 2 ./gridsetup.py 1 32 256 4096 4096 4096 0 0 2 ./gridsetup.py 1 64 128 4096 4096 4096 0 0 2 Step 2. Compilation A makefile is included which should run on Linux, SGI, OSF1 (Compaq) AIX and SunOS, but some editing will probably be required. On Linux, the makefile by default used the Intel F90 compiler, but you can edit the file and switch this to Lahey or PGI. For the other systems, it uses the vendor supplied F90 compiler. P3DFFT: For the best performance with high resolution runs on large processor counts, use the version of the code that used the SDSC p3dfft() parallel FFT. Both P3DFFT and FFTW must be built and installed in advance. Currently we require that P3DFFT be built double precision and with the -DSTRIDE1 option. You must also edit the makefile to build with -DUSE_P3DFFT -DUSE_FFTW and configure appropriate include and lib paths. Then: % cd dns/src % make dnsp3 To use the DNS code's internal parallel FFT: % cd dns/src % make dns There is also an optimized version of dns, "dnsp", with uses the DNS code's internal parallel FFT and is but limited to pencil decompositions like P3DFFT. But "dnsp" is no longer supported since "dnsp3" is faster. For low resolutions on moderate numbers of processors, there is almost no difference between dns, dnsp and dnsp3. Step 3. Edit the input file There are two choices for input files for this forced case in the src directory: forcing12.inp runs 1 eddy turnover time, with output forcing12-bench.inp runs 5 timesteps, only diagnostic output reports cpu time per timestep, averaged over the last 4 timesteps. Parameters of interest: viscosity coefficient mu (line 9) change to value computed above in step 0. derivative method (line 14) set to "fft-dealias" for 2/3 rule (exact dealiasing) or "fft-sphere" for spherical dealising (partial dealiasing, suitable for k^-5/3 or steeper spectra) or "fft-phase" for phase shifted + spherical dealising (exact dealiasing, 2x more expensive) time to run (line 26) time is measured in dimensional units For this problem, 1 eddy turnover time (after the code equilibriates) is close to 1 dimensional time, so set this to 1.0 Step 4. Run the code: To run the code on a single processor: ./dnsp3 -i forcing12-bench.inp output_name using the input file 'forcing12-bench.inp'. This runs a triply periodic forced turbulence problem. The forcing is in wave numbers 1 and 2. The output files are: output_name0000.0000.u u component of velocity output_name0000.0000.v v component of velocity output_name0000.0000.w w component of velocity output_name0000.0000.spec power spectrum (1D and spherical) output_name0000.0000.spect transfer spectrum output_name0000.0000.scalars KE, dissipation rates, etc... output_name0000.0000.scalars-turb skewness, other scalars... where 0000.0000 is the time of the snapshot. So for t=0.25, the filename would be: output_name0000.2500.u. For a parallel run: mpirun -np X ./dnsp3 -mio -i forcing12.inp output_name The "-mio" option will turn on MPI-IO. Without MPI-IO, all output will be funneled through processor 0. With MPI-IO, the code will still produce identical files, but will have up to M processors write data with asynchronious non-overlapping writes. The default value of M is 32, but it can be tweaked by editing subroutine "mpi_io_init". M can be set to be equal to the number of processes if your MPI-IO library and/or parallel file system has good I/O aggregation. These output files are also used as restart files. The restart is exact. To do a restart run copy (or create links) the snapshot to used to restart.u, restart.v and restart.w, and then add the "-r" option when running the code. Step 5. Validate the results. There are several test cases and scripts which run short problems and make sure the output is identical to the reference solutions. (These are not yet documented.) A quick test: run the 64^3 problem, with the forcing12.inp input file, with mu=.0014, "fft-sphere", and time=1.0. To run this test on 2 processors: cd src ./gridsetup.py 1 1 2 64 64 64 2 2 0 make dnsp3 ./dnsp3 -i forcing12.inp To run on more processors, see scripts/readme.job At simulation time 1.0, the data (given in stdout) should be approximatly: *** dns code w/ FFT99 ******************************************************** AMD Athlon PGI compiler g95 AMD X2 FC4 FC6 pentium M powerPC F7 gfortran intel gfortran Linux RHEL4 MAC OS10.4 2.1GHZ DDR2-800 2cpu 1 core 2 core kmax*eta = .9990 1.0214 .9839 .9626 1.0214 same KE =1.6800 1.6265 1.6828 1.7197 1.6265 R_lambda =61.47 62.22 59.7 58.4 62.22 run time 20min 22.8 66min 12.65 8.41 4.96 2.4ghz quad core, DDR2-666, gfortran, 4 cores: 1.30min *** dnsp code w/ FFT99 ****************************************************** intel compiler PGI compiler g95 intel on Thunderbird AMD Athlon pentium M 2ghz powerPC intel x86_64 Linux FC4 Linux RHEL4 MAC OS10.4 Linux CPUS = 1cpu 2cpu 1x1x32 2x1x32 4x1x32 kmax*eta = .9626 same .9990 SAME SAME KE = 1.7197 1.6801 R_lambda = 58.4 61.47 d/dt vis = -4.1263 -3.5569 d/dt f = 4.3034 4.5351 run time = 14.5m 11.4m .42m Other collected run times: 2.4ghz Xeon Linux, ifort, 1 cpu: 13.81 2.4ghz quad core, DDR2-666, gfortran, 1 cores: 2.97 2.4ghz quad core, DDR2-666, gfortran, 4 cores: 1.26min *** dnsp3 code (P3DFFT) ************************************************ 2.4ghz quad core, DDR2-666, gfortran, 4 cores: 1.23min kmax*eta = 0.9641 KE = 1.6525 R_lambda = 58.9 d/dt vis = -3.747 d/dt f = 4.122 ************************************************************************** The initial condition has random phases, and so these results are compiler and OS dependent. I haven't done detailed sensitivity study, but you can get a sence of the fluctuations by looking at the above numbers. Step 6. Looking at the data. Not yet documented - contact Mark Taylor (mataylo@sandia.gov) for help. There are matlab scripts in the dns/matlab directory for reading and processing all the output produced by the code. Some more complex analysis, such as computing structure functions and PDFs is done with fortran programs in the dns/src directory.
About
direct numerical simulation
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published