Running the model

The standard way to run modelE is to start it first from the initial conditions, run it for one hour of model time, let it write the checkpoint (restart) file and stop. After this one is supposed to check the log file to make sure that the model behaves properly and then one can restart it for a longer run. While running, the model periodically writes a checkpoint file, so that if it is stopped or killed by a system, it can be restarted from this checkpoint to continue the execution. Both ending times (for initial short run and for a longer run) are specified in the rundeck in the section with INPUTZ namelist parameters. The ending time for a short run is specified on a line which starts with ISTART=, the ending time for a longer run is specified on the line above it.

ModelE source repository provides a simple script for starting the model runs: modelE/exec/runE. This script provides enough functionality to run the model on personal computes (desktops and laptops), but when working on a supercomputer one typically has to submit the runs as batch jobs (since MPI jobs are not allowed to run interactively). This functionality depends on a particular architecture and a special script has to be written for each such computer. You have to check local information for a particular computer to see which scripts are available there.

Using `runE` script

The command to start a model run

    runE <RunID> [-np NP] [-t time] [-cold-restart] [-d] [-q] [-l logfile] [-s tag]

Here <RunID> is the name of your run (rundeck name without .R). The script accepts the following options:

-np NP: Run the model in parallel with NP MPI threads.
-t time: Specify the execution time time to be used in QSUB_STRING (see below). Otherwise has no effect.
-cold-restart: Start the run from the initial conditions. If not specified the run will be restarted from the latest checkpoint.
-d: Start the run in debugger. (You should compile the code with -g flag). By default gdb is used starting each MPI thread in a separate xterm window. If you want to use a different debugger you can specify your own debugger command by assigning it to an environment variable DEBUG_COMMAND.
-q: Do not write output to the log file. By default all standard output is written to <RunID>.PRT file.
-l logfile: Instead of <RunID>.PRT use logfile as a log file.
-s tag: Instead of default QSUB_STRING use QSUB_STRING_tag (see below).

If you want to use this script to start model runs as batch jobs on a supercomputer, you can do it in most cases. You just have to add a variable QSUB_STRING to your ~/.modelErc file and set it to a command which would start an appropriate batch job. One can set several such strings (with different settings) giving each one a "tag" and choose which string to use by specifying the tag with -s flag. For example, for Slurm resource manager you could use

QSUB_STRING="sbatch -A account_name -n %np -t %t"
QSUB_STRING_debug="sbatch --qos=debug -A account_name -n %np -t %t"

Here %np and %t are the number of MPI threads and execution time passed from the command line.

To start a run which was set up in the previous section, you can just execute

    ../exec/runE <RunID> -np <NP>

Actually, this command will always restart your <RunID> run from the latest saved checkpoint file.

Stopping the model

The standard way to stop the model is to use a command sswE (also located in modelE/exec). To stop the run <RunID> just execute the command

    sswE <RunID>

This will let the model know that you want to interrupt the execution. The model will finish the current time step, write the checkpoint file and then it will stop.