-
Notifications
You must be signed in to change notification settings - Fork 23
/
Copy pathnotes.doc
134 lines (102 loc) · 4.59 KB
/
notes.doc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
TODO:
pseudtospectral.tex:
put in Pope formula
put in Flop counts
code
run README problem with -.5 CFL
DNS code: 1024^3 with pfmon on 16 cpus?
measure flops on laptop + FC6?
check performance with: A2AOVERLAP, STK
time with/without scalars - disable in ns_xpencils
time FFT99 vs STK, add ACML
time XT3?
benchmark:
http://people.inf.ethz.ch/arbenz/book/Chapter5/wpp3DFFT.c
dnsp test output - make sure it is in correct format
(via restart?)
port ns_vorticity3 to ns.F90 model (since it will save 1 z transpose)
move scaling notes from runs.doc to here?
**********************************************************************************
0 BUG: Energy, when computed in spectral space: the cos(N/2) mode
is weighted to heavily. Not a problem for any dealiased case
since that mode will always be zero. This effects many routines
throughout the code.
0 BUG: 25x1x1 decomp calling output1 on 250x250x250 grid
but 1x1x25 works fine.
(needs 3GB to run)
10/3/04: checked on kiwi with 5x1x1 and 1x1x5, both produced
output files w/o any errors (didn't check output)
1. ns_ghost:
benchmark parallel decompostions
1000^3 4th and spectral
2000^3 on 512?
2. construct convergence test for KH problem
use E&Liu KH problem? what value of mu?
mu=0: at t=1, E is down .25e-6. dk/dt = -1.25e-6
mu=1.5e-6 at T=1 E is down 1.2e-4 and dk/dt = -1.26e-4
1e-5: noisy
1e-4: a touch noisy
2e-4: looks good, a little too smoothed
compare with spectral, 4th, 4th+1st order grad, or 1st order div
3. if run stops because u>1000, timings are incorrect
10. use a mpi defined type for the buffer->2d array copy in transpose.F90
100. post more messages at once in tranpose.F90?
**********************************************************************************
to get files of HPSS with stripe set:
xpsi get -pfsComp 32 dns/decay2048/\*4026.u
**********************************************************************************
Angle averging: 153 calls to transpose_z()
1 stage RK: 9 transpose_z(). = 36 transposes per timestep
angle averaging: 4.25 timesteps.
True cost: 150 timesteps.
**********************************************************************************
memory use:
Q: 512x512x256: formula 288Mb per cpu
observed: 334M
1 cpu: formula observed
514x514x10 362 385
130x130x130 301 326
**********************************************************************************
sizes
Nirvana: 32Gb per box. optimal is 225M per CPU.
1 box: 29Gb, max size: 600
2 box: 58Gb. max size: 760
4 box: 116Gb max size: 950
QidSE: 16x4 cpus. 64Gb memory
Qid: 128x4 512Gb memory
Linux box sizes: 3D array size: 3*8*n**3. 5 big arrays per run.
total size about 6 arrays
64^3 38M 6M per array
96^3 120M 20M per array, 100M total.
128^3 281M 48M per array
extrapolated:
256^3 2.25Gb .375M per variable
384^3 7.8GB
480^3: 15Gb
512^3: 18Gb 3Gb per variable (9GB per timestep. 4.5gb 130m gridpoints
640^3 29Gb
768^3 51Gb
1024^3 144Gb 24Gb per variable 1b gridpoints
1536^3 486Gb 81Gb per variable
2048^3 1152Gb 192Gb per variable
3072^3 3240Gb needs at least 1620 cpus. 6x1x512?
4096^3 9216Gb 1536Gb per variable
needs 4.6K cpus at 2GB per cpu
NEC ESS 10Tb
with tracers: 5 big arrays + 13D:
(6*3 + 5*NTRACERS)*8 = 144 + 40*NTRACERS
10 Tracers: 3.8x more memory
1000^3 10 tracers: 550GB = 256 cpus NO.
256^3 10 tracers: 8.6GB
**********************************************************************************
analytic CFL: for RK4, norm(G)<2.4
dealiased: k=KMAX*(2/3) = N/3 = 1/(3delx)
advection: G = i 2pi (u,v,w)dot(kx,ky,kz) delt
|G| <= 2pi/3 delt ( U/dx + V/dy + W/dz)
(U/dx+V/dy+W/dz) delt <= 2.4 3 / 2pi = 1.15
diffusion: G = mu 4 pi^2 (kx^2 +ky^2 + kz^2) delt
= mu (4/9) pi^2 delt (delx^-2 + dely^-2 + delz^-2)
mu delt (delx^-2+dely^-2+delz^-2) < 2.4 (9/4) / pi^2 = .55
.70: unstable
.60: seems stable. lets go with .55
**********************************************************************************