Merge pull request #593 from kevinstratford/ks-issue-472

Remove reference to non-existant optfile
ARCHER2-HPC · Apr 8, 2024 · 28360b0 · 28360b0
2 parents c45af6b + b6b515f
commit 28360b0
Showing 1 changed file with 37 additions and 45 deletions.
diff --git a/docs/research-software/mitgcm.md b/docs/research-software/mitgcm.md
@@ -18,26 +18,18 @@ flow of both the atmosphere and ocean.
 
 MITgcm is not available via a module on ARCHER2 as users will build
 their own executables specific to the problem they are working on.
-However, we do provide an optfile which will allow `genmake2` to create
-Makefiles which will work on ARCHER2.
-
-!!! note
-    The processes to build MITgcm on the ARCHER2 4-cabinet system and full
-    system are slightly different. Please make sure you use the commands
-    for the correct system below.
 
 You can obtain the MITgcm source code from the developers by cloning
 from the GitHub repository with the command
 
     git clone https://github.com/MITgcm/MITgcm.git
 
-You should then copy the ARCHER2 optfile into the MITgcm directories. You may use the files at the locations below for the 4-cabinet and full systems.
+You should then copy the ARCHER2 optfile into the MITgcm directories.
 
+!!! warning
+    A current ARCHER2 optfile is not available at the present time. Please
+    contact `support@archer2.ac.uk` for help.
 
-```bash
-cp /work/n02/shared/mjmn02/ECCOv4/cases/cce/cce1/scripts/dev_linux_amd64_cray_archer2 MITgcm/tools/build_options/
-```
-
 You should also set the following environment variables.
 `MITGCM_ROOTDIR` is used to locate the source code and should point to
 the top MITgcm directory. Optionally, adding the MITgcm tools directory
@@ -63,7 +55,7 @@ running
 
     genmake2 -help
 
-Finally, you may then build your executable by running 
+Finally, you may then build your executable by running
 
     make depend
     make
@@ -87,7 +79,7 @@ each for up to one hour.
 #SBATCH --cpus-per-task=1
 
 # Replace [budget code] below with your project code (e.g. t01)
-#SBATCH --account=[budget code] 
+#SBATCH --account=[budget code]
 #SBATCH --partition=standard
 #SBATCH --qos=standard
 
@@ -143,7 +135,7 @@ This can also sometimes lead to performance increases.
 #SBATCH --cpus-per-task=4
 
 # Replace [budget code] below with your project code (e.g. t01)
-#SBATCH --account=[budget code] 
+#SBATCH --account=[budget code]
 #SBATCH --partition=standard
 #SBATCH --qos=standard
 
@@ -170,7 +162,7 @@ those requested in the job submission script.
 
 ## Reproducing the ECCO version 4 (release 4) state estimate on ARCHER2
 
-The ECCO version 4 state estimate (ECCOv4-r4) is an observationally-constrained numerical solution produced by the ECCO group at JPL. If you would like to reproduce the state estimate on ARCHER2 in order to create customised runs and experiments, follow the instructions below. They have been slightly modified from the JPL instructions for ARCHER2. 
+The ECCO version 4 state estimate (ECCOv4-r4) is an observationally-constrained numerical solution produced by the ECCO group at JPL. If you would like to reproduce the state estimate on ARCHER2 in order to create customised runs and experiments, follow the instructions below. They have been slightly modified from the JPL instructions for ARCHER2.
 
 For more information, see the ECCOv4-r4 website <https://ecco-group.org/products-ECCO-V4r4.htm>
 
@@ -180,11 +172,11 @@ First, navigate to your directory on the ``/work`` filesystem in order to get ac
 
     mkdir MYECCO
     cd MYECCO
-    
-In order to reproduce ECCOv4-r4, we need a specific checkpoint of the MITgcm source code. 
+
+In order to reproduce ECCOv4-r4, we need a specific checkpoint of the MITgcm source code.
 
     git clone https://github.com/MITgcm/MITgcm.git -b checkpoint66g
-    
+
 Next, get the ECCOv4-r4 specific code from GitHub:
 
     cd MITgcm
@@ -193,7 +185,7 @@ Next, get the ECCOv4-r4 specific code from GitHub:
     git clone https://github.com/ECCO-GROUP/ECCO-v4-Configurations.git
     mv ECCO-v4-Configurations/ECCOv4\ Release\ 4/code .
     rm -rf ECCO-v4-Configurations
-    
+
 ### Get the ECCOv4-r4 forcing files
 
 The surface forcing and other input files that are too large to be stored on GitHub are available via NASA data servers. In total, these files are about 200 GB in size. You must register for an Earthdata account and connect to a WebDAV server in order to access these files. For more detailed instructions, read the help page <https://ecco.jpl.nasa.gov/drive/help>.
@@ -205,9 +197,9 @@ Next, acquire your WebDAV credentials: <https://ecco.jpl.nasa.gov/drive> (second
 Now, you can use wget to download the required forcing and input files:
 
     wget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_forcing
-    wget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_init 
+    wget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_init
     wget -r --no-parent --user YOURUSERNAME --ask-password https://ecco.jpl.nasa.gov/drive/files/Version4/Release4/input_ecco
-      
+
 After using `wget`, you will notice that the `input*` directories are, by default, several levels deep in the directory structure. Use the `mv` command to move the `input*` directories to the directory where you executed the `wget` command. Specifically,
 
 ```
@@ -235,22 +227,22 @@ If you haven't already, set your environment variables:
     export MITGCM_ROOTDIR=../../../../MITgcm
     export PATH=$MITGCM_ROOTDIR/tools:$PATH
     export MITGCM_OPT=$MITGCM_ROOTDIR/tools/build_options/dev_linux_amd64_cray_archer2
-    
+
 Next, compile the executable:
 
     genmake2 -mods ../code -mpi -optfile $MITGCM_OPT
     make depend
     make
-    
-Once you have compiled the model, you will have the mitgcmuv executable for ECCOv4-r4. 
+
+Once you have compiled the model, you will have the mitgcmuv executable for ECCOv4-r4.
 
 #### Create run directory and link files
 
 In order to run the model, you need to create a run directory and link/copy the appropriate files. First, navigate to your directory on the ``work`` filesystem. From the ``MITgcm/ECCOV4/release4`` directory:
 
     mkdir run
     cd run
-    
+
     # link the data files
     ln -s ../input_init/NAMELIST/* .
     ln -s ../input_init/error_weight/ctrl_weight/* .
@@ -261,11 +253,11 @@ In order to run the model, you need to create a run directory and link/copy the
     ln -s ../input_forcing/eccov4r4* .
 
     python mkdir_subdir_diags.py
-    
+
     # manually copy the mitgcmuv executable
     cp -p ../build/mitgcmuv .
 
-For a short test run, edit the ``nTimeSteps`` variable in the file ``data``. Comment out the default value and uncomment the line reading ``nTimeSteps=8``. This is a useful test to make sure that the model can at least start up. 
+For a short test run, edit the ``nTimeSteps`` variable in the file ``data``. Comment out the default value and uncomment the line reading ``nTimeSteps=8``. This is a useful test to make sure that the model can at least start up.
 
 To run on ARCHER2, submit a batch script to the Slurm scheduler. Here is an example submission script:
 
@@ -280,7 +272,7 @@ To run on ARCHER2, submit a batch script to the Slurm scheduler. Here is an exam
 #SBATCH --cpus-per-task=1
 
 # Replace [budget code] below with your project code (e.g. t01)
-#SBATCH --account=[budget code] 
+#SBATCH --account=[budget code]
 #SBATCH --partition=standard
 #SBATCH --qos=standard
 
@@ -298,15 +290,15 @@ export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK
 srun --distribution=block:block --hint=nomultithread ./mitgcmuv
 ```
 
-This configuration uses 96 MPI processes at 12 MPI processes per node. Once the run has finished, in order to check that the run has successfully completed, check the end of one of the standard output files. 
+This configuration uses 96 MPI processes at 12 MPI processes per node. Once the run has finished, in order to check that the run has successfully completed, check the end of one of the standard output files.
 
     tail STDOUT.0000
-    
-It should read 
+
+It should read
 
     PROGRAM MAIN: Execution ended Normally
-    
-The files named `STDOUT.*` contain diagnostic information that you can use to check your results. As a first pass, check the printed statistics for any clear signs of trouble (e.g. NaN values, extremely large values). 
+
+The files named `STDOUT.*` contain diagnostic information that you can use to check your results. As a first pass, check the printed statistics for any clear signs of trouble (e.g. NaN values, extremely large values).
 
 #### ECCOv4-r4 in adjoint mode
 
@@ -318,19 +310,19 @@ If you have access to the commercial TAF software produced by <http://FastOpt.de
     cd ..
     mkdir build_ad
     cd build_ad
-    
+
 In this instance, the ``code_ad`` and ``code`` directories are identical, although this does not have to be the case. Make sure that you have the ``staf`` script in your path or in the ``build_ad`` directory itself. To make sure that you have the most up-to-date script, run:
 
     ./staf -get staf
-    
+
 To test your connection to the FastOpt servers, try:
 
     ./staf -test
-    
+
 You should receive the following message:
 
     Your access to the TAF server is enabled.
-    
+
 The compilation commands are similar to those used to build the forward case.
 
     # load relevant modules
@@ -342,16 +334,16 @@ The compilation commands are similar to those used to build the forward case.
     make depend
     make adtaf
     make adall
-    
+
 The source code will be packaged and forwarded to the FastOpt servers, where it will undergo source-to-source translation via the TAF algorithmic differentiation software. If the compilation is successful, you will have an executable named ``mitgcmuv_ad``. This will run the ECCOv4-r4 configuration of MITgcm in adjoint mode. As before, create a run directory and copy in the relevant files. The procedure is the same as for the forward model, with the following modifications:
 
     cd ..
     mkdir run_ad
     cd run_ad
     # manually copy the mitgcmuv executable
     cp -p ../build_ad/mitgcmuv_ad .
-    
-To run the model, change the name of the executable in the Slurm submission script; everything else should be the same as in the forward case. As above, at the end of the run you should have a set of `STDOUT.*` files that you can examine for any obvious problems. 
+
+To run the model, change the name of the executable in the Slurm submission script; everything else should be the same as in the forward case. As above, at the end of the run you should have a set of `STDOUT.*` files that you can examine for any obvious problems.
 
 
 ##### Compile time errors
@@ -361,14 +353,14 @@ relink with --no-relax` then add the following line to the FFLAGS options: `-Wl,
 
 ##### Checkpointing for adjoint runs
 
-In an adjoint run, there is a balance between storage (i.e. saving the model state to disk) and recomputation (i.e. integrating the model forward from a stored state). Changing the `nchklev` parameters in the `tamc.h` file at compile time is how you control the relative balance between storage and recomputation. 
+In an adjoint run, there is a balance between storage (i.e. saving the model state to disk) and recomputation (i.e. integrating the model forward from a stored state). Changing the `nchklev` parameters in the `tamc.h` file at compile time is how you control the relative balance between storage and recomputation.
 
 A suggested strategy that has been used on a variety of HPC platforms is as follows:
 1. Set `nchklev_1` as large as possible, up to the size allowed by memory on your machine. (Use the `size` command to estimate the memory per process. This should be just a little bit less than the maximum allowed on the machine. On ARCHER2 this is 2 GB (standard) and 4 GB (high memory)).
-2. Next, set `nchklev_2` and `nchklev_3` to be large enough to accomodate the entire run. A common strategy is to set `nchklev_2 = nchklev_3 = sqrt(numsteps/nchklev_1) + 1`. 
-3. If the `nchklev_2` files get too big, then you may have to add a fourth level (i.e. `nchklev_4`), but this is unlikely. 
+2. Next, set `nchklev_2` and `nchklev_3` to be large enough to accommodate the entire run. A common strategy is to set `nchklev_2 = nchklev_3 = sqrt(numsteps/nchklev_1) + 1`.
+3. If the `nchklev_2` files get too big, then you may have to add a fourth level (i.e. `nchklev_4`), but this is unlikely.
 
-This strategy allows you to keep as much in memory as possible, minimising the I/O requirements for the disk. This is useful, as I/O is often the bottleneck for MITgcm runs on HPC. 
+This strategy allows you to keep as much in memory as possible, minimising the I/O requirements for the disk. This is useful, as I/O is often the bottleneck for MITgcm runs on HPC.
 
 Another way to adjust performance is to adjust how tapelevel I/O is handled. This strategy performs well for most configurations:
 ```