The standard way to run modelE is to start it first from the initial
conditions, run it for one hour of model time, let it write the
checkpoint (restart) file and stop. After this one is supposed to check
the log file to make sure that the model behaves properly and then one
can restart it for a longer run. While running, the model
periodically writes a checkpoint file, so that if it is stopped
or killed by a system, it can be restarted from this checkpoint
to continue the execution. Both ending times (for initial
short run and for a longer run) are specified in the rundeck in the
section with INPUTZ
namelist parameters. The ending time
for a short run is specified on a line which starts with
ISTART=
, the ending time for a longer run is specified on
the line above it.
ModelE source repository provides a simple script for starting the
model runs: modelE/exec/runE
. This script provides enough
functionality to run the model on personal computes (desktops and
laptops), but when working on a supercomputer one typically has to
submit the runs as batch jobs (since MPI jobs are not allowed to run
interactively). This functionality depends on a particular
architecture and a special script has to be written for each such
computer. You have to check local information for a particular
computer to see which scripts are available there.
runE
scriptThe command to start a model run
runE <RunID> [-np NP] [-t time] [-cold-restart] [-d] [-q] [-l logfile] [-s tag]Here
<RunID>
is the name of your run (rundeck name
without .R
). The script accepts the following options:
-np NP
NP
MPI threads.
-t time
time
to be used
in QSUB_STRING
(see below). Otherwise has no effect.
-cold-restart
-d
-g
flag). By default gdb
is used starting
each MPI thread in a separate xterm
window. If you want to use a
different debugger you can specify your own debugger command by
assigning it to an environment variable DEBUG_COMMAND
.
-q
<RunID>.PRT
file.
-l logfile
<RunID>.PRT
use logfile
as
a log file.
-s tag
QSUB_STRING
use QSUB_STRING_tag
(see below).
If you want to use this script to start model runs as batch jobs on a
supercomputer, you can do it in most cases. You just have to add a
variable QSUB_STRING
to your ~/.modelErc
file and set it to a command which would start an appropriate batch
job. One can set several such strings (with different settings) giving
each one a "tag" and choose which string to use by
specifying the tag with -s
flag. For example, for Slurm
resource manager you could use
QSUB_STRING="sbatch -A account_name -n %np -t %t" QSUB_STRING_debug="sbatch --qos=debug -A account_name -n %np -t %t"Here
%np
and %t
are the number of MPI
threads and execution time passed from the command line.
To start a run which was set up in the previous section, you can just execute
../exec/runE <RunID> -np <NP>Actually, this command will always restart your
<RunID>
run from the latest saved checkpoint
file.
The standard way to stop the model is to use a command sswE
(also located in modelE/exec
). To stop the run
<RunID>
just execute the command
sswE <RunID>This will let the model know that you want to interrupt the execution. The model will finish the current time step, write the checkpoint file and then it will stop.