This section presents a general overview of how CCSM3 operates. A full description of how to set up a production run is described in section 6. In what follows we assume that the user has already run create_newcase and configure (see section 4).
CCSM3 input data are provided as part of the release via several input data tar files. The tar files are typically broken down by components and/or resolutions. These files should be downloaded and untarred into a single input data root directory (see section 3.3). Each tar file will place files under a common directory named inputdata/. The inputdata/ directory contains numerous subdirectories and the CCSM3 assume that the directory structure and filenames will be preserved.
The officially released input data root directory name is set in the env_mach.$MACH file via the environment variable is $DIN_LOC_ROOT. A default setting of $DIN_LOC_ROOT is provided for each machine in env_mach.$MACH.
The user should edit this value if it does not correspond to their inputdata/ root. Multiple users can share the same inputdata/ directory. The files existing in the various subdirectories of inputdata/ should not have Unix write permission on them.
An empty input data root directory tree is also provided as a future place holder for custom user-generated input datasets. This is set in the env_mach.$MACH file via the environment variable $DIN_LOC_ROOT_USER. If the user wishes to use any user-modified input datasets in place of the officially released version, these should be placed in the appropriate subdirectory of $DIN_LOC_ROOT_USER/.
The appropriate CCSM resolved component scripts (in $CASEROOT/Buildnml_Prestage/) must then also be modified to use the new filenames. Any datasets placed in $DIN_LOC_ROOT_USER/ should have unique names that do not correspond to any datasets in $DIN_LOC_ROOT/. The contents of $DIN_LOC_ROOT/ should not be modified. The user should be careful to preserve these changes, since invoking configure -cleanall will remove all user made changes.
CCSM3 can be built by either interactively running $CASE.$MACH.build or by batch submission of $CASE.$MACH.run (since $CASE.$MACH.build is executed automatically from $CASE.$MACH.run). We recommend that CCSM3 be built interactively. There are several reasons for this. First, building interactively allows the user to immediately detect build related problems without waiting in the batch queueing system. Second, the build process normally occurs on a much smaller set of processors than is used to run CCSM. Consequently, an interactive build saves computing cycles.
The $CASE.$MACH.build script does the following:
Input data prestaging is carried out as part of the build procedure via calls to ccsm_getinput, ccsm_getfile and ccsm_getrestart. These scripts reside in $CCSMROOT/ccsm3/scripts/ccsm_utils/Tools/. The script, $CASE.$MACH.build, always calls ccsm_getrestart, which attempts to copy each component's restart files and associated restart pointer file from the directory, $DOUT_S_ROOT/restart/, to the component's executable directory. If the copy is not successful, a warning message will be printed and the $CASE.$MACH.build script will continue. We note that successfully obtaining restart files using ccsm_getrestart depends on the activation of short-term archiving (see section 5.5 in order to populate the short term archive restart directory. We also note that a CCSM3 restart run is produced by setting the environment variable $CONTINUE_RUN to TRUE in env_run. If $CONTINUE_RUN is set to TRUE, each component's restart files and associated restart pointer file must be either in the directory $DOUT_S_ROOT/restart/, that component's executable directory or available from the long-term archiving area.
If the build has occurred successfully, the user will see the following message:
------------------------------------------------------------- - CCSM BUILD HAS FINISHED SUCCESSFULLY -------------------------------------------------------------
If this message is not seen, a compiler problem has probably occurred. The user should carefully review the build output to determine the source of the problem
Each CCSM component generates its executable by invoking gmake. The makefile and corresponding machine-specific makefile macros are found in the directory $CCSMROOT/ccsm3/models/bld/:
$CCSMROOT/ccsm3/models/bld/ | +-----------+--------------+ | | | makdep.c Makefile Macros.*
The Macros.* files contain machine-specific makefile directives. In the current release, the Macros have been divided into different platform-dependent files each containing site/machine specific options. The site and operating system characteristics are set in the machine-specific file, env_mach.$MACH via the environment variables $OS and $SITE. In addition, machine dependent options are also included in the Macros files for specific platforms that have been tested. The machine-specific options are set in the Macros files by use of the environment variable $MACH.
If a user needs to modify compiler options for a specific machine, only the machine specific Macros files needs to be edited. Similarly, if a user wants to add a new machine to the CCSM3 scripts, they will need to add or modify the Macros files appropriately to support that new machine. More information about porting CCSM3 to a new machine is available in section 6.10.
For most CCSM components, the specific files used to build each CCSM component are defined in a file called``Filepath''. This file is generated by the scripts, $CASEROOT/Buildexe/*.buildexe.csh, and contains a list of directories specifying the search path for component source code. The directories listed in Filepath appear in order of importance, from most important first to least important last. If a piece of code appears in two of the listed directories, the code in the directory appearing first will be used and all other versions will be ignored. No error is generated if a directory listed in Filepath does not exists or does not contain any code.
The CCSM3 make system generates file dependencies automatically. Users do not need to maintain these dependencies manually. The makdep.c code is compiled and run by most components prior to building.
The CCSM3 model is built in the directory specified by $EXEROOT (in env_mach.$MACH). $EXEROOT/ contains a set of subdirectories where each component executable will be built.
$EXEROOT | | +--------+------+--+---+------+-----+-----+----+----+----+ | | | | | | | | | | cpl/ atm/ ocn/ ice/ lnd/ esmf/ mct/ mph/ lib/ all/ | | | | | +-+-+ +-+-+ +-+-+ +-+-+ +-+-+ | | | | | obj/ obj/ obj/ obj/ obj/
Each subdirectory in $EXEROOT/ contains the component executable, input datasets, and namelist needed to run that specific CCSM component. For each component, the $obj/ directory contains all files created during the compilation of that component. This includes the dependency files, cpp products and object files. Component output data, such as standard out logs, history and restart datasets will also be written into that component's $EXEROOT/ subdirectory. Some of the components, such as POP and CSIM, have separate subdirectories for input, restart and history data, while CAM and CLM output all of these into one directory.
Each component *.buildexe.csh script has a directory, $CASEROOT/SourceMods/src.xxx/ (where xxx is the component name, e.g. cam) as the first Filepath directory. This allows user modified code to be easily introduced into the model by placing the modified code into the appropriate $CASEROOT/SourceMods/src.xxx/ directory.
The CCSM3 run script, $CASE.$MACH.run, is generated as a result of invoking configure and is normally submitted in batch mode after the model executables have been built interactively. The specific command required to submit this script is machine dependent. Common batch submission commands in the scripts are ``llsubmit $CASE.$MACH.run" and "qsub $CASE.$MACH.run". It is worthwhile to note that CCSM can be run interactively if there are appropriate resources available.
Upon submission of the script, $CASE.$MACH.run, the following will occur as part of a CCSM3 run:
In particular, the script, $CASE.$MACH.run, does the following:
The environment variable, $RUN_TYPE in env_conf determines the way in which a new CCSM run will be initialized. $RUN_TYPE can have values of 'startup', 'hybrid' or 'branch'.
In a startup run, each component's initialization occurs from some arbitrary baseline state. In a branch run, each component is initialized from restart files. In a hybrid run initialization occurs via a combination of existing CCSM restart files for some components (e.g. POP and CSIM) and initial files for other components (e.g. for CAM and CLM).
The value of $START_DATE in env_conf is ignored for a branch run, since each model component will obtain the $START_DATE from its own restart dataset. The coupler will then validate at run time that all the models are coordinated in time and therefore have the same $START_DATE. This is the same mechanism that is used for performing a restart run (where $CONTINUE_RUN set to TRUE). In a hybrid or startup run, $START_DATE is obtained from env_conf and not from component restart or initial files. Therefore, inconsistent restart and/or initial files may be used for hybrid runs, whereas they may not be used for branch runs.
All CCSM components produce "restart" files containing data necessary to describe the exact state of the CCSM run when it was halted. Restart files allow the CCSM to be continued or branched to produce exactly the same answer (bit-for-bit) as if it had never stopped. A restart run is not associated with a new $RUN_TYPE setting (as was the case in CCSM2), but rather is determined by the setting of the environment variable $CONTINUE_RUN in env_run.
In addition to the periodic generation of restart files, some CCSM components (e.g. CAM and CLM) also periodically produce netCDF initial files. These files are smaller and more flexible than the component's binary restart files and are used in cases where it is not crucial for the new run to be bit-for-bit the same as the run which produced the initial files.
The following provides a summary of the different initialization options for running CCSM.
Types of Files Used Under Various Runtype parameters:
atm lnd ocn ice cpl ----- ----- ----- ----- ----- startup : nc internal internal+file binary internal/delay hybrid : nc nc binary binary internal/delay branch : binary binary binary binary binary
Delay mode is when the ocean model starts running on the second day of the run, not the first. In delay mode, the coupler also starts without a restart file and uses whatever fields the other components give it for startup. It's generally climate continuous but produces initial changes that are much bigger than roundoff.
A detailed summary of each $RUN_TYPE setting is provided in the following sections.
When the environment variable $RUN_TYPE is set to 'startup', a new CCSM run will be initialized using arbitrary baseline states for each component. These baseline states are set independently by each component and will include the use of restart files, initial files, external observed data files or internal initialization (i.e. a ``cold start''). By default, the CCSM3.0 scripts will produce a startup run.
Under a startup run, the coupler will start-up using "delay" capabilities in which the ocean model starts running on the second day of the run, not the first. In this mode, the coupler also starts without a restart file and uses whatever fields the other components give it for startup.
The following environment variables in env_conf define a startup run:
The following holds for a startup run:
A hybrid run indicates that the CCSM is to be initialized using datasets from a previous CCSM run. A hybrid run allows the user to bring together combinations of initial/restart files from a previous CCSM run (specified $RUN_REFCASE) at a given model output date (specified by $RUN_REFDATE) and change the start date ($RUN_STARTDATE) of the hybrid run relative to that used for the reference run. In a branch run the start date for the run cannot be changed relative to that used for the reference case since the start date is obtained from each component's restart file. Therefore, inconsistent restart and/or initial files may be used for hybrid runs, whereas they may not be used for branch runs. For a hybrid run using the fully active component set (B) (see section 1.3.1), CAM and CLM will start from the netCDF initial files of a previous CCSM run, whereas POP and CSIM will start from binary restart files of that same CCSM run.
Tthe model will not continue in a bit-for-bit fashion with respect to the reference case under this scenario. The resulting climate, however, should be continuous as long as no namelists or model source code are changed in the hybrid run. The variables $RUN_REFCASE and $RUN_REFDATE in env_conf are used to specify the previous (reference) case and starting date of the initial/restart files to be used. In a hybrid run, the coupler will start-up using the "delay" capabilities.
The following environment variables in env_conf define a hybrid run:
Note that the combination of $RUN_REFCASE and $RUN_REFDATE specify the initial/restart reference case data needed to initialize the hybrid run. The following holds for a hybrid run:
A branch run is initialized using binary restart files from a previous run for each model component. The case name is generally changed for a branch run, although it does not have to be.
In the case of a branch run, the setting of $RUN_STARTDATE in env_conf is ignored since each model component will obtain the start date from its own restart dataset. At run time, the coupler validates that all the models are coordinated in time and therefore have the same start date. This is the same mechanism that is used for performing a restart run (where $CONTINUE_RUN is set to TRUE).
Branch runs are typically used when sensitivity or parameter studies are required or when settings for history file output streams need to be modified. Under this scenario, the new case must be able to produce bit-for-bit exact restart in the same manner as a continuation run if no source code or namelist inputs are modified. All models must use full bit-for-bit restart files to carry out this type of run. Only the case name changes.
The following environment variables in env_conf define a branch run:
The following holds for a branch run:
To start up a branch or hybrid run, restart and/or initial data from a previous run must be made available to each model component. As is discussed below, restart tar files of the form
where id corresponds to a unique creation time stamp, are periodically generated. The restart tar files contain data that is required to start up either a hybrid or branch run.
The simplest way to make this data available to the hybrid or branch run at initialization is to untar appropriate reference case restart.tar file in the $DOUT_S_ROOT/restart/ short-term archiving directory of the branch or hybrid run case. For example, assume that a new hybrid case, Test2, is to be run on machine blackforest, using restart and initial data from case Test1, at date yyyy-mm-dd-sssss. Also assume that the short-term archiving directory ($DOUT_S_ROOT (in env_mach.blackforest) is set to /ptmp/$LOGNAME/archive/Test2. Then the restart tar file
should be untarred in
The script, $CCSMROOT/scripts/ccsm_utils/Tools/ccsm_getrestart, will then prestage this data to the Test2 component executable directories at run time.
CCSM3 is comprised of a collection of distinct models optimized for a very high-speed, parallel multi-processor computing environment. Each component produces its own output stream consisting of history, restart and output log files. Component history files are in netCDF format whereas component restart files are in binary format and are used to either exactly restart the model or to serve as initial conditions for other model cases.
Standard output generated from each CCSM component is saved in a "log file" located in each component's subdirectory under $EXEROOOT/. Each time the CCSM is run, a single coordinated timestamp is incorporated in the filenames of all output log files associated with that run. This common timestamp is generated by the run script and is of the form YYMMDD-hhmmss, where YYMMDD are the Year, Month, Day and hhmmss are the hour, minute and second that the run began (e.g. ocn.log.040526-082714). Log files can also be copied to a user specified directory using the variable $LOGDIR in env_run. The default is ``'', so no extra copy of the log file occurs.
By default, each component writes monthly averaged history files in netCDF format and also writes binary restart files. The history and log files are controlled independently by each component. Restart files, on the other hand, are written by each component at regular intervals dictated by the flux coupler via the setting of $REST_OPTION and $REST_N in env_run. Restart files are also known as "checkpoint" files. They allow the model to stop and then start again with bit-for-bit exact capability (i.e. the model output is exactly the same as if it had never been stopped). The coupler coordinates the writing of restart files as well as the model execution time. All components receive information from the coupler and write restarts or stop as specified by the coupler. Coupler namelist input in env_run sets the run length and restart frequency via the settings of the environment variables $STOP_OPTION, $STOP_N, $RESTART_OPTION and $RESTART_N. Each component's log, diagnostic, history, and restart files can be saved to the local mass store system using the CCSM3 long-term archiver.
The raw history data does not lend itself well to easy time-series analysis. For example, CAM writes one large netCDF history file (with all the requested variables) at each requested output period. While this allows for very fast model execution, it makes it difficult to analyze time series of individual variables without having to access the entire data volume. Thus, the raw data from major CCSM integrations is usually postprocessed into more user-friendly configurations, such as single files containing long time-series of each output fields, and made available to the community (see section 10.
Archiving is a phase of the CCSM production process where model output is moved from each component's executable directory to a local disk area (short-term archiving) and subsequently to a long-term storage system (long-term archiving). It has no impact on the production run except to clean up disk space and help manage user quotas.
Short and long-term archiving environment variables are set in the env_mach.$MACH file. Although short-term and long-term archiving are implemented independently in the scripts, there is a dependence between the two since the short-term archiver must be be turned on in order for the long-term archiver to be activated.
By default, short-term archiving is enabled and long-term archiving is disabled. Several important points need to be made about archiving:
Short-term archiving is executed as part of running the $CASE.$MACH.run script. The short-term archiving script, ccsm_s_archive, resides in the ccsm_utils/Tools ($UTILROOT/Tools) directory. Short-term archiving is executed after the CCSM run is completed if $DOUT_S is set to TRUE in env_mach.$MACH. The short-term archiving area is determined by the setting of $DOUT_S_ROOT in env_mach.$MACH.
The short-term archiver does the following:
The ccsm_s_archive script is written quite generally. However, there may be certain user cases where it needs to be modified for a production run because different sets of files need to be stored. If this is the case, ccsm_s_archive should be copied to the user's $CASEROOT/ directory and modified there since in general this file is shared among different production runs. In addition, the path to ccsm_s_archive in the $CASE.$MACH.run file also must be modified.
Long-term archiving is done via a separate CCSM script that can be run interactively or submitted in batch mode. Long-term archiving saves files onto the local mass store system. It also can copy data files to another machine via scp. Normally, the long-term archiver is submitted via batch automatically at the end of every CCSM production run. The long-term archive script is generated by configure and since is a machine-dependent batch script called $CASE.$MACH.l_archive.
The environment variables which control the behavior of long-term archiving are set in the file, env_mach.$MACH (see section 4.8.5) and correspond to:
Not all of these parameters are used for all mass store systems. The long-term archiver calls ccsm_l_archive which in turns calls ccsm_mswrite to actually execute the mass store writes. The script ccsm_mswrite is configured to test the local mass store and execute the appropriate command to move data onto the local mass store. Both ccsm_l_archive and ccsm_mswrite script reside in the ccsm_utils/Tools ($UTILROOT/Tools/) directory.
The long-term archiver is also capable of copying files to another machine or site via scp. This requires that scp passwords be set up transparently between the two machines and will also likely require modification to the ccsm_l_archive script to specify which files should be moved. The parameters in env_mach.$MACH that turn this on are:
The above feature is not currently supported.
Although the ccsm_l_archive script is written quite generally, there may be cases where it needs to be modified for a given production run because different sets of files need to be stored. If this is the case, ccsm_l_archive should be copied to the user's $CASEROOT/ directory, modified and the path to ccsm_l_archive in $CASE.$MACH.run also must be changed accordingly.