NEW_IO

M. Kelley, November 2010

Table of Contents

HOW-TO configure your rundeck and set up your environment
HOW-TO examine a quantity in the restart file
HOW-TO compare restart files
HOW-TO save a quantity in the restart file
HOW-TO obtain scaled diagnostics (short version)
HOW-TO obtain scaled diagnostics
HOW-TO do time averages
HOW-TO obtain lat-lon outputs from a cubed-sphere run
HOW-TO print the diagnostics tables
HOW-TO convert netcdf to GISS-binary format
HOW-TO print a netcdf variable in ASCII format
HOW-TO extract hemispheric/global means
HOW-TO write "subdaily" diagnostics in netcdf format
HOW-TO add a new diagnostic to an existing category
HOW-TO add a new category of diagnostic
Special diagnostic calculations
Troubleshooting
Plotting options
How it works
Local information

HOW-TO configure your rundeck and set up your environment

Cubed-sphere rundecks function in NEW_IO mode only and no changes are necessary. To reconfigure a rundeck for NEW_IO, change the following:

#define NEW_IO in the Preprocessor Options section
in the Object Modules: section, replace IORSF by IO_DRV If doing a tracer run, add TRDIAG somewhere.
in the Components: section, add dd2d
If you wish to use parallel-netcdf, add a line OPTS_dd2d = NC_IO=PNETCDF in the Component Options: section of the rundeck, and check that PNETCDFHOME is set in your .modelErc (see Local information for installation locations). If your run writes restart files larger than 1-2 GB, this option is highly recommended.
If ISTART=2: in the Data input files: section, set GIC=name_of_a_netcdf_gic.nc (The Local information section will point you to some existing netcdf GIC files)

Most systems running modelE have a netcdf installation which is probably already specified in your .modelErc file. However, your PATH environment variable may not include the location of two important helper programs: ncdump and ncgen. If it doesn't, add NETCDFHOME/bin to your PATH (NETCDFHOME denoting whatever is present in your .modelErc).

Unless indicated otherwise, standalone programs/scripts referenced below are installed in a location noted in the Local information section. Most are built from the code in the model/mk_diags directory. Though not necessary to run the model, the NCO package is a prerequisite for the ncksprt script. Again, see Local information

HOW-TO examine a quantity in the restart file

To view a plot of the instantaneous state of particular model variables, one can open a restart file using a netcdf-aware plotting package. However, note that restart files are not currently written with coordinate data and other metadata, which causes some packages to throw up their hands and refuse to plot anything, period. Thankfully there exist packages like the CDAT GUI vcdat that accept just about any input and plot it in gridpoint space.

To simply print the (local) values of one or model variables to the screen, use the ncksprt utility described here:

HOW-TO compare restart files

Say you have two restart files fort.2.8proc.nc and fort.2.1proc.nc whose comparison will tell you whether the model produced the same result on 8 processors versus 1. To print a report of which variables differ, and their maximum absolute and relative differences, use the diffreport utility (which can be applied to any two netcdf files from any source, actually):

   diffreport fort.2.8proc.nc fort.2.1proc.nc

To suppress reports for certain variables, an optional third command-line argument can be passed to diffreport specifying a dummy netcdf variable whose attributes contain a list of on/off switches. One use of this feature in the modelE example above would be:

   diffreport fort.2.8proc.nc fort.2.1proc.nc is_npes_reproducible

Here, the is_npes_reproducible variable is defined by modelE to contain a list of (diagnostic) arrays known to have roundoff differences on different processor counts.

For a broader look at a potential problem, the NCO utility ncdiff can be used to generate a file containing the differences which can then be viewed (or printed with ncksprt):

   ncdiff fort.2.8proc.nc fort.2.1proc.nc diff.nc

HOW-TO save a quantity to the restart file

As of March 2010, NEW_IO versions of existing model input/output routines exist alongside the default io_XYZ versions, but with a new_ prefix. With a few exceptions described below, these function analogously to the io_XYZ versions, but call special routines which handle the netcdf and parallelization details. A single call to one of these routines reads/writes a single model variable, which is associated with the corresponding variable in the file using its netcdf name. There is no requirement that the Fortran name in the code match the netcdf name in the file. The ordering of variables in a netcdf file is arbitrary since they are always read/written using their netcdf names.

! write distributed array u to the netcdf variable 'u':
    call write_dist_data(grid,fid,'u',u) ! grid is a dist_grid object, fid is file ID

! read distributed array t from the netcdf variable 't':
    call read_dist_data(grid,fid,'t',t)

! root processor writes its copy of non-distributed array idacc to the netcdf variable 'idacc'
    call write_data(grid,fid,'idacc',idacc)

! read non-distributed variable s0 from netcdf variable s0, and broadcast to all processors
    call read_data(grid,fid,'s0',s0,bcast_all=.true.) ! bcast_all is an optional argument

Subroutines write_dist_data,read_dist_data take an optional argument jdim which specifies which dimension is the LAST horizontal dimension; if jdim is not specified it is assumed to be 2, which is the case for model arrays like temperature T(I,J,L). To write an array dimensioned L,I,J, set jdim=3

Before calling one of these I/O routines, the shapes of model variables and their netcdf names must have been declared already, via a call to defvar in one of the def_rsf_XYZ subroutines. The sizes of dimensions are inferred from those of Fortran arrays passed to defvar, and the netcdf names of variables and their dimensions are taken from string arguments.

! declare a scalar 's0'
    call defvar(grid,fid,s0,'s0')
! declare a 1-D array 'idacc' and a dimension 'nsampl'
    call defvar(grid,fid,idacc,'idacc(nsampl)')

For distributed arrays, a prefix dist_ must be added to dimension names. The sizes of distributed dimensions are taken from the grid object.

! declare a 3-D array 't' whose first two dimensions are distributed
    call defvar(grid,fid,t,'t(dist_im,dist_jm,lm)')  ! grid is a dist_grid object, fid is file ID

Obviously many model variables share dimensions; it is not necessary to declare separate dimension names for each variable. If a dimension is ever redeclared with a different size than previously, defvar will abort.

If a call to one of the write routines is made for a variable that does not exist in the output file, they will abort. If a read routine is called for a nonexistent variable, a warning message is printed but execution will continue; this behavior is useful when restarting from older model versions for example.

HOW-TO obtain scaled diagnostics

   scaleacc acc-file acc-array-name[,name2,name3...]

# Examples:
# scale the JAN1901 "aj" and "aij" accumulations of run E001xyz:
   scaleacc JAN1901.accE001xyz.nc aj,aij # output files will be JAN1901.{aj,aij}E001xyz.nc
# scale all available accumulations using the "all" request
   scaleacc JAN1901.accE001xyz.nc all

The standalone (i.e. run-independent) scaleacc utility converts the contents of an accumulation array to final scaled form in much the same way as the pdE command. Accumulations are divided by the number of times they were accumulated, scale factors are applied, ratios are calculated, and so forth. The name of the output file produced is the name of the accumulation file with the "acc" substring replaced by the name of the accumulation array (this differs from the filename choices of pdE). From the end user's point of view, there are a few other minor procedural differences compared to "standard" pdE:

Final scaled outputs are written in netcdf format, but these are easily convertible to GISS-binary using one of the utilities documented here.
Individual diagnostics categories are requested by name rather than via the KDIAG namelist array. Several categories can be specfied with a comma-separated list, or this collection can be built up one category at a time. For simplicity, remapped output must be generated using the latter approach.
scaleacc does NOT perform time averaging - time averages are constructed by first applying the sumfiles utility to the set of acc-files constituting a particular time period (see the next section), and then applying scaleacc to the resulting "sum of files".
scaleacc does not calculate diagnostics defined using nontrivial operations on time-mean output; special standalone programs have been created for this purpose.

Model E diagnostics categories configured for scaleacc include

aj: atmospheric model, budget page (budget latitude bands)
areg: atmospheric model, aj diagnostics for predefined regions
ajl: atmospheric model, latitude-height (budget latitude bands)
aij: atmospheric model, longitude-latitude (or on the cubed-sphere grid if applicable)
aijl: atmospheric model, longitude-latitude-height (or on the cubed-sphere grid if applicable)
consrv: atmospheric model, conservation quantities on the budget grid
agc: atmospheric model, latitude-height general circulation diagnostics on constant-pressure levels (winds, temperature, eddy fluxes etc.)
aijk: atmospheric model, longitude-latitude-height general circulation diagnostics on constant-pressure levels (winds, temperature, eddy fluxes etc.)
tconsrv: atmospheric model, tracer conservation quantities on the budget grid
tajl: atmospheric model, latitude-height tracer fields (budget latitude bands)
taij: atmospheric model, longitude-latitude tracer fields (or on the cubed-sphere grid if applicable)
taijl: atmospheric model, longitude-latitude-height tracer fields (or on the cubed-sphere grid if applicable)
adiurn: atmospheric model, diurnal cycles at selected gridpoints
hdiurn: atmospheric model, hourly timeseries at selected gridpoints
otj: Ocean R, northward transports
ojl: Ocean R, latitude-depth
oij: Ocean R, longitude-latitude
oijl: Ocean R, longitude-latitude-depth
olnst: Ocean R, straits
toijl: Ocean R, longitude-latitude-depth tracer fields
icij: Viscous-plastic ice dynamics, longitude-latitude fields

HOW-TO do time averages

   sumfiles acc-files-to-be-summed

# Example: produce the JJA1901 accumulations of run E001xyz:
   sumfiles {JUN,JUL,AUG}1901.accE001xyz.nc

The scaled diagnostics for a multi-month averaging period are generated by applying scaleacc to an acc-file generated by sumfiles which contains sums over the months in this averaging period. sumfiles attempts to guess an appropriate name for the file it produces, but is not foolproof. Caution should be used when applying regular expressions like *1901.accE001xyz.nc ; this will cause problems if one has already created seasonal sums like JJA1901.accE001xyz.nc for example. Multi-year sums can be calculated by applying sumfiles to single-year sums.

The sumfiles program can do more than the computation of the sums used to define time averages. In fact, the only accumulation arrays it sums are those having a netcdf attribute reduction = "sum". In addition to "sum", it currently understands "min" and "max"; other operations can easily be added.

HOW-TO obtain lat-lon outputs from a cubed-sphere run

For a cubed-sphere run whose so-called "native grid" is not a latitude-longitude mesh, scaled diagnostics from accumulation arrays such as aij can be output on the native grid or remapped to a user-specified latitude-longitude grid. Native-grid accumulation arrays remain native in acc-files; remapping to a different grid is requested during the scaleacc step by adding the name of a "remap" file to the command line:

# Example: scale the JAN1901 "aij" accumulations of run E001xyz whose native
# grid is C90, and output them on a 288x180 latitude-longitude grid defined
# in the file remap_C90_288x180.nc
   scaleacc JAN1901.accE001xyz.nc aij remap_C90_288x180.nc

The name of the remap file is arbitrary since the resolutions of the cubed-sphere and latitude-longitude grids are taken from its contents. If no remap file is specified, scaleacc outputs quantities on their native grids. Outputs which are ratios are defined as the ratio of remapped numerators and denominators. At some point the calculation of remapping information may be moved into the scaleacc process, but most users will find it convenient to use one of several precalculated files available on systems where modelE runs or acc-files are archived (see Local information concerning locations of remap files). The choice of remapping method for a particular diagnostic (first- versus second-order accuracy, area-weighted versus interpolation) is made during the declaration of the metadata for that diagnostic (conventions for this are currently being decided). For more information, contact the author.

HOW-TO print the diagnostics tables

To allow a large amount of output to be perused quickly, modelE provides routines which print certain categories of results in tabular form (e.g. zonal means). Most of these routines have been transplanted to standalone programs and modified to read the files produced by scaleacc. Fortran format statements, powers of 10, and other information needed to replicate the look and feel of existing output are retrieved from the metadata for each quantity. Each standalone program is designed to print one and only one category of output (although tracer quantities can be printed using the same programs as non-tracer quantities). Tables are printed to standard output which can be redirected to user-specified output files. Currently available print programs and their syntax for January 1901 of a run E001xyz are:

#  program          input file or argument
   prtaj            JAN1901.ajE001xyz.nc
   prtareg          JAN1901.aregE001xyz.nc
   prtconsrv        JAN1901.consrvE001xyz.nc
   prtconsrv_tracer JAN1901.tconsrvE001xyz.nc
   prtajl           JAN1901.ajlE001xyz.nc     # also works for tajl
   prtadiurn        JAN1901.adiurnE001xyz.nc
   prtrvr           JAN1901.accE001xyz.nc     # river flow; acc file is read
   prtisccp         JAN1901.accE001xyz.nc     # ISCCP clouds; acc file is read
   prtotj           JAN1901.otjE001xyz.nc
   prtolnst         JAN1901.olnstE001xyz.nc
   prtostat E001xyz JAN1901                   # reads ojl and oij output files

HOW-TO convert netcdf to GISS-binary format

GISS-binary files can be created from netcdf files using one of these programs:

write_giss2d       infile.nc outfile [ varname OR dimname1 dimname2 ]
write_2d_as_giss4d infile.nc outfile [ varname OR dimname1 dimname2 ]

which behave identically save for the coordinate information written to "GISS 4D" files. If the optional argument varname is specified, only that netcdf variable is written to the GISS-binary file; otherwise all dimension-matched variables are written. If the optional arguments dimname1 and dimname2 are specified, dimension-matched variables are defined as those whose dimensions include dimname1 and dimname2; otherwise all variables having two or more dimensions are written. If dimname1 and dimname2 are not specified, they are assumed to correspond to the first two dimensions of 3D+ variables. A two-dimensional variable is written as a single fortran record, and each record of a 3D+ variable contains a two-dimensional "slab" of data spanning the two dimensions dimname1 and dimname2. The 80-byte title of each record is created from the long_name and units attributes of the variable being written, and contains the index information along the dimensions other than the slab dimensions dimname1 dimname2. The slab dimensions need not be the first two of a given variable, nor consecutive. If long_name is absent, the netcdf variable name is used in the title. Examples:

write_giss2d JAN1901.aijE001xyz.nc JAN1901.aijE001xyz.giss2d            # convert all variables in the input file
write_giss2d JAN1901.aijE001xyz.nc JAN1901.aijE001xyz.giss2d lon lat    # extract only the aij variables dimensioned by lon and lat
write_giss2d JAN1901.aijE001xyz.nc JAN1901.tsurfE001xyz.giss2d tsurf    # extract only the "tsurf" variable
write_giss2d JAN1901.aijlE001xyz.nc JAN1901.aijlE001xyz.giss2d lon plm  # aijl variables split along the lat dimension to make "IL" records
write_giss2d JAN1901.aijlE001xyz.nc JAN1901.aijlE001xyz.giss2d          # all aijl variables will be split along their 3rd dimension
write_2d_as_giss4d JAN1901.ajlE001xyz.nc JAN1901.ajlE001xyz.giss4d      # extract all AJL fields

HOW-TO obtain scaled diagnostics (short version)

R. Ruedy has created a NEW_IO version of the pdE script which executes the commands described in the previous sections. It collects the printed tables for all diagnostics categories into a single text file, and offers the option to convert netcdf to GISS-format files. Execute pdE without arguments to see the syntax of its usage.

HOW-TO print a netcdf variable in ASCII format

The script ncksprt employs the NCO utility ncks to print the values of variables in ASCII format, with further formatting of the output done by UNIX sed, awk, and paste. Coordinate values are printed along with the requested variables, and multiple variables can be printed simultaneously (in column format). The syntax for specifying dimension bounds is that of NCO: integers correspond to (1-based) dimension indices, while floating-point numbers correspond to coordinate values. Although it can be used to print multidimensional hyperslabs of data, this tool was intended for point or one-dimensional reports. Examples:

# surface air temperature and wind speed as a function of longitude, at the latitude closest to 30 degrees north
   ncksprt -v tsurf,wsurf -d lat,30. JAN1901.aijE001xyz.nc

# model variables t(i,j,l),q(i,j,l) at i=20 and j=10 for l=1 to l=5 
   ncksprt -v t,q -d im,10 -d jm,10 -d lm,1,5 fort.2.nc  # netcdf dimension names are im,jm,lm in the restart file

HOW-TO extract hemispheric/global means

To avoid making standalone programs understand the details of horizontal grids and area weightings, the model computes hemispheric and global means of accumulation quantities, saves these means in auxiliary arrays in acc-files, and defines auxiliary scaled output quantities having the suffix _hemis. These auxiliary outputs have a dimension name shnhgm of size 3 in addition to whatever other relevant dimensions they may possess. The first position in the shnhgm dimension contains the southern hemisphere mean, the second the northern hemisphere, and the third the global mean. From an aij output file for example, the hemispheric and/or global means of surface air temperature can be printed using ncksprt

    ncksprt -v tsurf_hemis JAN1901.aijE001xyz.nc
    ncksprt -v tsurf_hemis -d shnhgm,3 JAN1901.aijE001xyz.nc # print global mean only

The scaleacc program defines the means of diagnostics which are ratios using the ratio of the means of the respective accumulations.

HOW-TO write "subdaily" diagnostics in netcdf format

(The reader is assumed to be familiar with the workings of the subdaily diagnostics code.)

A line #define NEW_IO_SUBDD in the Preprocessor Options section of a rundeck is currently required to override the default output format for these diagnostics (one lat-lon slice per Fortran binary sequential-access record). Note that this option currently requires that the model be built with parallel netcdf.

The default routine write_data outputs only one lat-lon slice per call; for simplicity/efficiency, an alternate interface write_subdd was introduced to allow output of 3- and higher-dimensional arrays in one call. The coding for the default format will soon be changed to use this interface.

HOW-TO add a new diagnostic to an existing category

For the most part, postprocessing by standalone programs does not change the procedure. However, it does require attention to a few details that are sometimes overlooked. Firstly, the short names of output quantities must be accepted by the netcdf library: they must begin with an alphabetic character followed by zero or more alphanumeric characters (including underscores). Secondly, for outputs that are ratios, a denominator must be stored somewhere in the accumulation array since it cannot be computed "on the fly" during postprocessing. The metadata for the numerator should contain the index of the denominator. Some categories of diagnostics in modelE (e.g. aij,ajl) are already scaled online using this denominator system, so there are examples to follow.

More challenging for standalone postprocessing are diagnostics that are declared locally within modelE print programs, or those which having some meta-metadata not registered anywhere (e.g. which tracer outputs need division by gridcell area when most others don't, or vice versa). The number of such special cases has declined recently and hopefully the trend will continue.

Finally, for outputs that are simple functions of already-existing outputs, it may be more expedient to calculate/extract them using generic tools rather than adding new code to modelE.

Special diagnostic calculations

Special-purpose programs for outputs not fitting the generic-postprocessing mold include:

agcstat: calculates standing-eddy statistics, refractive indices, etc. from time-mean output, and places the results into the agc output file
prtostat: prints selected ocean circulation statistics from time-mean output

HOW-TO add a new category of diagnostic

Recipe to be written.

Troubleshooting

Known causes of aberrant behavior:

Application of sumfiles/scaleacc to acc-files residing in a mass-storage facility. Some facilities have commands to create temporary copies of slow-media (tape) files on higher-performance media. The coding of sumfiles/scaleacc assumes that once a file has been successfully opened, its contents will remain available during execution. Unfortunately, files can sometimes "disappear" before execution completes.

Plotting options

To be written.

How it works

Move mk_diags/conventions.txt here.

Local information


                       On Discover:
                       ------------

modelE standalone programs: /discover/nobackup/projects/giss/exec

netcdf GIC files in /discover/nobackup/projects/giss/prod_input_files:
                     GIC.E046D3M20A.1DEC1955.ext.nc  for 4x5    lat-lon resolution
                     GIC.144X90.DEC01.1.ext.nc           2x2.5  lat-lon resolution
                     GIC.288X180.DEC01.1.ext.nc          1x1.25 lat-lon resolution


PNETCDFHOME=/discover/nobackup/mkelley5/pnetcdf-1.2.0  (ifort 10.1.017, impi 3.2.011)

remap files: /discover/nobackup/mkelley5/remap_files

NCO programs: /usr/local/other/NCO/3.9.9_gcc/bin