CESM Naming Conventions

April 2018

Contents:


CESM Experiment Casenames

Standardized CESM experiment case names should be used for any experiments that:

  • are added to the the CESM experiments data base, or
  • have caseroots in a shared CESM runs dir (at NCAR: /glade/p/cseg/runs), or
  • archive output in a shared CESM archive location on disk

The convention is: <compset char>.<code base>.<compset shortname>.<res shortname>[.opt_desc_string].<nnn>[opt_char]

Important Note: none of the strings delimited by a "." in the casename should contain any of the special wildcard characters +,*,?,{,} that are used by the CESM scripts in pattern matching.

Details:

  • <compset char> is a single character, the first letter of the compset, lower case
  • <code base> is "e20" for cesm2.0, "e10" for cesm1.0, "c40" for ccsm4.0, "c35" for ccsm3.5
  • <compset shortname> is the compset shortname recognized by create_newcase
  • <res shortname> is the resolution shortname recognized by create_newcase
  • [.opt_desc_string] is an OPTIONAL descriptive string, the only constraint is that it must not contain a "." (as this is reserved for delineating and parsing components of the case name). While this string is largely unconstrained, CSEG strongly encourages users to either create a very short string (eg. eight characters or less) or avoid using it altogether.
  • <nnn> is a 3 digit number, the smallest number that results in a unique case name
  • [opt_char] is an OPTIONAL single lower-case letter {a,b,...,z}, normally not used, but is allowed to denote a group of cases that are very closely related. Those cases would then have identical case names except for this single character suffix.

Examples:

      b.e20.B1850.f09_g17.pi_control.all.294
      b.e20.BHIST.f09_g17.20thC.294_01
      e.e20.E1850.f09_g17.pi_control.2xco2.125
      f.e20.F2000.f09_f09.293_001
      f.e20.FW1850.f09_f09_mg17.295
      g.e20.G.TL319_t13.startup.001


CESM Output Filename Conventions

The naming conventions for CESM output files fall into two broad categories: those files generated by the CESM component models at run-time ("model output data") and those created by post-processing the run-time files ("post-processed data").

CESM Model Output Data Filenames

The general filename formats for output files generated at run-time by the CESM component models are:

$output = $CASE.$scomp.[$type.][$string.]$date[$ending]
$log = $CASE.$gcomp.$ltype.$logdate

where:

  • [] denotes an optional filename element
  • $output denotes any model history, restart, initial, or diagnostic output file
  • $log denotes any model log file
  • $CASE = (A-Z,a-z,0-9), "." (dot), and/or "_" (underscore).
    $CASE is the case name character string, which must be 80 or fewer characters long. See the CESM2 Quick Start User's Guide and CESM Experiment Naming Conventions for more information on the definition of $CASE.
  • $scomp = (cam,cice,cism,clm2,cpl,dart,datm,desp,dice,dlnd,docn,drof,dwav,mosart,pop,rtm)
    $scomp is the string that indicates the specific component-model name.
  • $gcomp = (atm,cpl,esp,glc,ice,lnd,ocn,rof,wav)
    $gcomp is the string that indicates the generic component-model name.
  • [$type] = (h*,r*,i*,d*,e*)
    $type is a one-, two-, or three-character string which denotes the output file type. In the future, more characters may be required. $type must begin with h (history), r (restart), i (initial), d (diagnostic) or e (External System Processing) and may be followed by up to two optional characters (a-z,0-9). Certain restart $type strings have special meanings: rs (cam surface restart), rh (restart history), and rd (restart diagnostic).
    One known exception to this is that the cice model sometimes generates history files with a $type string of unlimited length and which may include the "_" (underscore) character or other characters.

    For h (history) files, the second and third characters of the pattern h* signify a unique data stream that corresponds to a namelist value defining the time period frequency in which the data variables were written to the file. In order to make the netCDF files fully self-describing and satisfy post-processsing requirements to distinguish unique history streams, a new metadata field called time_period_freq is now written to every history file header with one of the corresponding values:

    • second_N... N ∈ {1..3599} - data are written every N seconds
    • minute_N... N ∈ {1..59} - data are written every N minutes
    • hour_N..... N ∈ {1..23} - data are written every N hours
    • day_N...... N ∈ {1..31} - data are written every N days
    • month_N.... N ∈ {1..11} - data are written every N months
    • year_N..... N ∈ {1..99999} - data are written every N years

    Important Note: Even though $type is optional, the default settings for the short-term archiver depend on filename patterns that adhere to these conventions. Consequently, if the filenames do not include a $type identifier then the user will need to modify the rules in the $CASEROOT/env_archive.xml file to set the filename pattern matching rules correctly.

  • $string
    $string is an optional, typically short character string (A-Z,a-z,0-9) "." (dot), "_" (underscore), and/or "-" (dash) which is used to further identify the history file type. The ESP (External Statistical Package) component relies on this $string value to identify different output file types required by the DART (Data Assimilation Research Testbed). For more information regarding ESP and DART, please refer to the DART web site.
  • $ltype = log
    $ltype is the character string log which denotes the model output log type.
  • $date = (yyyy-mm-dd-sssss, yyyy-mm-dd, yyyy-mm, yyyy)
    $date is the model date string which is based on a yyyy-mm-dd-sssss convention where:
    • yyyy (0000,9999) is the year string (if necessary, yyyy can be increased to more than four digits, but four is the minimum and anything greater than four requires script and code modifications)
    • mm (01,02,...,12) is the month string
    • dd (01,02,...,31) is the day string
    • sssss (0000,..,86399) is the second string

    In date-conforming output files, the following are established $date strings and their meanings:

    • yyyy - annual average (history file)
    • yyyy-mm - monthly average (history file)
    • yyyy-mm-dd - daily average (history file)
    • yyyy-mm-dd-sssss - instantaneous or timeseries. This is the stamp used for restart files, instantaneous history files ("snapshot" files), and timeseries history files (the date stamp indicates the timestamp of the first data written to the file).

  • $logdate = yymmdd-hhmmss
    $logdate is the log-file date string, which is a real-world date string of the form indicated in the definition.
  • $ending = (.nc)
    $ending denotes the file format. The following optional endings are supported: .nc (netCDF3, netCDF4, and netCDF4c).


CESM Model Output File Locations

Depending on the stage of the model execution and the options selected in the CESM case, run-time output files will reside in different locations. See the CESM2 Quick Start Guide for the definitions of $RUNDIR and $DOUT_S_ROOT and how they are set.

  • During execution, files in the $RUNDIR directory are:
    • $RUNDIR/$output
    • $RUNDIR/$log

  • If the short-term archiving option is active, $DOUT_S = TRUE, files will be moved to the $DOUT_S_ROOT directory following execution. In this case, the files are segregated by component and type (history, restart or log):
    • $DOUT_S_ROOT/$CASE/$gcomp/hist/$output - history and diagnostics files
    • $DOUT_S_ROOT/$CASE/rest/$rdate/$output - all files needed for restart including initialization files
    • $DOUT_S_ROOT/$CASE/logs/$log - log files

    where
    • $DOUT_S_ROOT is the short-term archiving root directory
    • $CASE is the case name
    • $gcomp = [atm,cpl,esp,glc,ice,lnd,ocn,rof,wav] is a subdirectory with history time-slice files and POP diagnostics files
    • $rdate = yyyy-mm-dd-sssss is the restart date for all files in the subdirectory

Example of Short-Term Archive Filenames:

The * below represents "optional" characters and YYYY = one year prior to yyyy.

[...] = $DOUT_S_ROOT

[...]/$CASE/atm/hist/$CASE.cam.h*.yyyy-mm.nc
[...]........../hist/$CASE.cam.h*.yyyy-mm-dd-sssss.nc
[...]/...../cpl/hist/$CASE.cpl.h*.yyyy.nc
[...]/........./hist/$CASE.cpl.h*.yyyy-mm.nc
[...]/........./hist/$CASE.cpl.h*.yyyy-mm-dd.nc
[...]/........./hist/$CASE.cpl.h*.yyyy-mm-dd-sssss.nc
[...]/...../esp/hist/* (External System Processing files created by DART)
[...]/...../glc/hist/$CASE.cism.h*.yyyy-mm-dd-sssss.nc
[...]/........./hist/$CASE.cism.initial_hist.yyyy-mm-dd-sssss.nc
[...]/...../ice/hist/$CASE.cice.h.yyyy-mm.nc
[...]/...../lnd/hist/$CASE.clm2.h*.yyyy-mm.nc
[...]/........./hist/$CASE.clm2.h*.yyyy-mm-dd-sssss.nc
[...]/...../logs/[atm,cesm,cpl,glc,ice,lnd,ocn,rof,wav].log.*.[gz]
[...]/...../ocn/hist/$CASE.pop.h*.yyyy-mm.nc
[...]/........./hist/$CASE.pop.h.nday1.yyyy-mm-dd.nc
[...]/........./hist/$CASE.pop.h.ecosys.nday1.yyyy-mm-dd.nc
[...]/........./hist/$CASE.pop.d*.yyyy-mm-dd-sssss (diagnostics files)
[...]/........./hist/$CASE.pop.hm.yyyy-mm-dd-sssss.nc (movie stream)
[...]/........./hist/$CASE.pop.hs.yyyy-mm-dd-sssss.nc (snapshot stream)
[...]/........./hist/$CASE.pop.h.once.nc (diagnostics and prognostics variables)
[...]/........./hist/$CASE.pop.hv.nc (viscosity variables)
[...]/...../rest/yyyy-mm-dd-sssss/$CASE.cam.h0.YYYY-12.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.cam.i.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.cam.r.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.cam.rs.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.cice.r.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.cism.r.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.clm2.h0.YYYY-12.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.clm2.r.yyyy-01-01-00000
[...]/........../yyyy-mm-dd-sssss/$CASE.clm2.rh0.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.cpl.r.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.mosart.h0.YYYY-12.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.mosart.r.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.mosart.rh0.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.pop.r.yyyy-01-01-00000.nc
[...]/........../yyyy-mm-dd-sssss/$CASE.pop.ro.yyyy-01-01-00000
[...]/........../yyyy-mm-dd-sssss/$CASE.ww3.r.yyyy-01-01-00000
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.atm
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.drv
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.glc
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.ice
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.lnd
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.ocn.ovf
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.ocn.restart
[...]/........../yyyy-mm-dd-sssss/$CASE.rpointer.rof


Post-processing Data

CESM post-processing tools work with the files stored in the short term archive locations defined in the previous section. Post-processing tools include:

  • History time-slice file conversion to history variable time-series files.
  • Climatology files containing computed averages over specified time-periods spanned by the run.
  • Climate Model Output Rewriter (CMOR) compatible output filenames and sub-directories
  • Diagnostics plot sets

The CESM python-based parallel post-processing tools are available from the public https://github.com/NCAR/CESM_postprocessing repository with support currently limited to NCAR machines.

CESM conforms to the following points in order to allow for consistant post-processing tools handling of model output data:

  • Model output data streams should conform to these standards as much as possible. Unique diagnostic files or other output files could have some non-standard naming conventions if that makes them easier to identify.
  • Restart dates are for the current timestep to run, so would be, for example, 0002-01-01-00000 if written at the end of year 1, month 12.
  • All restart filenames have the format $CASE.$scomp.r*.yyyy-mm-dd-sssss, unless information is in the rpointer files about what auxiliary files are required.
  • All history filenames have the format $CASE.$scomp.h?[.$string].[$date][$ending]. The second character of the type and the optional string are left up to the developer. There are no requirements, although optional string names must be coordinated with the CESM archiving scripts.
  • All monthly average history files look like $CASE.$scomp.h?[.$string].yyyy-mm[$ending]. The date string, NOT THE TYPE, tells you whether the file is yearly, monthly, daily averaged or other. Starting with CESM2.0, the output netCDF history file for each component contains the metadata variable time_period_freq which defines the frequency in which the data are written. See the $type above for conventions used for this variable.
  • History files and restart history files are connected via $CASE.$scomp.h* and $CASE.$scomp.rh* even though the dates might be quite different. The dates for restart history files must conform with the restart date. The date for history files is set by the averaging strategy or the last date data were written.
  • Care must be used when not using the optional history character. For instance, for a case where you might have $CASE.$scomp.h.yyyy.nc and $CASE.$scomp.h.yyyy-mm.nc for an annual average and monthly average outputs, the date makes the files unique. However, their restart equivalents will have identical names because the date string is the yyyy-mm-dd-sssss of the restart. There will be a conflict. It is up to the model developer, and generally not the user, to make this robust.
  • In short:
    • All restart-file types start with r.
    • All history-file types start with h.
    • All initial-file types start with i.
    • All restart-file dates are of the form yyyy-mm-dd-sssss.
    • All restarts of history files use the history type prepended with an r.
    • The history-file date is the only indication of the type of average contained in the file. For example, yyyy-mm indicates the history file contains monthly averaged fields.
    • Some models have multiple restarts, some have multiple histories, etc. Model developers should decide whether to attach an optional character to the r, h or i file-type designator and what that character would be.

Filename Requirements for Post-Processed CESM Data

In the preceding section, naming conventions were established for model output data files generated by the CESM model as it executes. In this section, the conventions are extended in order to define rules for data files that result from the post-processing of CESM model output data.

Post-processed data files may include temporal averages (eg, seasonal, annual, or decadal), spatial averages (eg, zonal, meridional, global), timeseries, or other diagnosed quantities (eg, meridional overturning streamfunction, barotropic streamfunction). The following rules are intended to provide a consistent structure for naming each of these types of files.

The development of these naming conventions was guided by the desire to:

  • maintain a close and logical connection to the original model output filenames
  • allow for the easy identification of the processed files' contents from the filename itself
  • separate at a high level the standard CESM processed data, which are available to the general CESM community, from the CESM model output data or the more specialized processed data, such as data intended for the IPCC inter-comparison project
  • allow for the creation of unique filenames

Users are free to use their own naming conventions in their own personal directories, but filenames of all data files that are written to NCAR common disk directories and mass store must follow the conventions described in this document.

If there are particular post-processing circumstances that are not addressed in this document, it is important to discuss them with CSEG first, prior to creating non-standard filenames. Only after the issues have been resolved and the documentation updated should the files be created with new naming conventions.

The value of the netCDF history file metadata variable, time_period_freq, defines the sub-directories locations where files are stored during post-processsing. For example, if the $CASE.cam.h0.* history files corresponds to a time_period_freq = month_1, then the corresponding post-processing sub-directory in the archive locations is:

[...]/$CASE/atm/hist/proc/tseries/month_1/$CASE.cam.h0.$VARNAME.yyyymm-yyyymm.nc

All post-processed filenames conform to the following general format:

  • $DIRNAME/$FILENAME
where
  • $DIRNAME = $DOUT_S_ROOT/$CASE/$gcomp/$subdir/$tdir/[$tperiod]
  • $FILENAME = $CASE.$scomp.$type.$SSTRING.$TSTRING[.$ending]
quantities within square brackets [] are optional, and $CASE, $gcomp, $scomp, and $type are as defined above.

The following are definitions of the various components of $DIRNAME and $FILENAME; note that several examples follow below, to illustrate the use of these options.

  • $subdir = (proc)
    $subdir differentiates broad classifications of processed data files at a high level. proc is used for the standard suite of CESM post-processing.
  • $tdir = (climo[.start-year.stop-year],tseries,diag)
    $tdir is used to distinguish time-averages from time series from diagnostic plot sets; note that the tseries directory can include timeseries of time-averaged quantities; see examples below
  • $tperiod = [hour_${N},day_${N},month_${N},year_${N}]
    $tperiod denotes the time period over which the data were processed. ${N} = 1,2,3,...
  • $SSTRING
    $SSTRING provides a flexible means to describe non-temporal aspects of the file contents; the rules governing $SSTRING are listed below
  • $TSTRING
    $TSTRING provides a means to describe temporal aspects of the file contents, either a specific time period or a range of time periods represented in the processed file; the rules governing $TSTRING are listed below
  • $ending = [nc]
    $ending is an optional filename suffix used to describe the file format
    • $ending = .nc indicates the file is in netCDF format

Both $SSTRING and $TSTRING have additional rules that are intended to allow for the creation of a unique filename that helps to unambiguously identify the contents of the file:

  • $SSTRING Format: substring1[_substring2[_substring3...]]
  • $SSTRING Rules:
    1. The complete absence of $SSTRING has a particular meaning: there have been no spatial operations done on the original model history-file contents, and all of the original history-file variables are contained within the post-processed data file
    2. $SSTRING may contain one or more "descriptors," each of which is denoted by substring${N}, ${N}=1,2,... in the format above. A descriptor identifies important aspects of the file contents, such as the names of fields that have been extracted from the original history files, or spatial operations that have been performed. Certain standard descriptor names have been established and are cataloged below
    3. If there are multiple descriptors in $SSTRING, each is separated from the other by the underscore character ("_")
    4. $SSTRING may contain field information. For netCDF files, use the short_name value(s), such as such as UU, VVEL, or UVEL_VVEL
    5. The absence of a field name in $SSTRING indicates that all fields from the original history file are included in the processed file

The format and rules for $TSTRING, which is intended to indicate the time or time periods of the original data files which were processed in order to create this file, are as follows:

  • $TSTRING Format: datestring1[-datestring2]
  • $TSTRING Rules:
    1. $TSTRING datestrings must follow the conventions for model output files, eg
      • yyyymmddhhZ   -- instantaneous - "Z" is mandatory to indicate GMT
      • yyyymmdd   -- daily average
      • yyyymm   -- monthly average
      • yyyy   -- annual average
    2. If present, the $TSTRING temporal operator is separated from datestring1 and datastring2 by the hyphen character ("-")

Examples of Post-processed Filenames

Combinations of the various string names are intended to provide sufficient flexibility to describe the file contents. The following examples are used to illustrate standard practices for post-processed data filenames and locations.
([...] = $DOUT_S_ROOT):

  • Directory names ($DIRNAME)
    [...]/b.e20.B1850.f09_g17.294/atm/proc/tseries/day_1
    [...]/b.e20.B1850.f09_g17.294/lnd/proc/tseries/month_1
    [...]/b.e20.B1850.f09_g17.294/ice/proc/climo/b.e20.B1850.f09_g17.294/b.e20.B1850.f09_g17.294.50-100
    [...]/b.e20.B1850.f09_g17.294/ocn/proc/climo.1.10
    [...]/b.e20.B1850.f09_g17.294/lnd/proc/diag

  • Single variable time series for 50 years - time series of 50 years of monthly average atmospheric UU velocities, for the period January 400 through December 499, inclusive:
    [...]/b.e20.B1850.f09_g17.294/atm/proc/tseries/month_1/b.e20.B1850.f09_g17.294.cam.h0.UU.040001-049912.nc

  • All fields in a 5 year average - 5-year time average of all atmospheric model h0 history fields, averaged over years 2 through year 6, inclusive (note the absence of $SSTRING)
    [...]/b.e20.B1850.f09_g17.294/atm/proc/climo/b.e20.B1850.f09_g17.294/b.e20.B1850.f09_g17.294.2-6/b.e20.B1850.f09_g17.294.cam.h0.0002-0006._ANN_climo.nc

  • All fields in an annual average - annual average of all ocean-model monthly history fields, averaged over year 0010
    [...]/b.e20.B1850.f09_g17.294/ocn/proc/climo.1-10/b.e20.B1850.f09_g17.294.pop.h.0010.nc

  • 50 year average for a single vaiable - 50-year average of the pop temperature field (TEMP) and salinity (SALT) averaged over years 200 through 249, inclusive, by month (mavg) and total (tavg).
    [...]/b.e20.B1850.f09_g17.294/ocn/proc/climo.200.249/mavg.0200-0249.nc [...]/b.e20.B1850.f09_g17.294/ocn/proc/climo.200.249/tavg.0200-0249.nc