Community Climate System Model
Software Engineering Plan 2000-2005

November 2000

Component Model Plans:  updated June 2001

Prepared by:

Cecelia DeLuca (NCAR)
J. Walter Larson (Argonne National Laboratory)
Lawrence Buja (NCAR)
Anthony Craig (NCAR)
John Drake (Oak Ridge National Laboratory)


 

Table of Contents

Executive Summary
1 Introduction
2 Software Description
3 Software Goals and Requirements
4 Software Strategy
    4.1 Software Management
    4.2 Software Restructuring
    4.3 Software Practices
    4.4 User Support
    4.5 Timeline
5 CCSM-Related Software Projects and Initiatives
    5.1 DOE ACPI "Avant Garde" Project
    5.2 NASA HPCC Earth System Modeling Framework
    5.3 I/O Library
    5.4 Parallel netCDF
6 Summary

Appendices
Appendix A:  Summary of CCSM-Related Projects and Collaborations
Appendix B:   Summary of Proposed CCSM Software Engineering Documents
Appendix C:  The Software Best Practices Initiative
Appendix D:  The Capability Maturity Model for Software
 

Executive Summary

NCAR's Community Climate System Model (CCSM) is critical to the nation, both as a climate change assessment tool and as a research tool for studying wide-ranging aspects of the Earth system.  The CCSM software must manage the complexity of the intertwined physical and dynamical processes that it simulates, and it must extract efficient performance from today's complicated and transient supercomputer architectures.  The software engineering challenges will only increase as scientists press for greater resolution, longer runs, and ensemble runs to achieve the goals outlined in the Community Climate System Model Plan 2000-2005.

In this document we present a plan to make the CCSM a more efficient, extensible, robust, and easy to use piece of software over the next five years.  To accomplish these objectives, the areas of software management, code structure, software engineering practices, and user support will require coordinated changes.  Some of these changes will occur through CCSM involvement in two large initiatives.  The first of these, part of the Accelerated Climate Prediction Initiative (ACPI), is an ongoing 18-month NCAR/DOE collaboration that focuses on reworking the CCSM atmospheric component and coupling component.  The second initiative has been motivated by a still-to-be-released NASA High Performance Computing and Communications Cooperative Agreement Notice (NASA HPCC CAN) that calls for the creation of an Earth System Modeling Framework (ESMF).  The ESMF is intended to increase the reuse and interoperability of climate and weather codes by identifying a common non-scientific software infrastructure and developing it jointly through a multi-institutional collaboration.  The CCSM is a proposed testbed for the ESMF project.  One of the current challenges of the CCSM project is the coordination of the ACPI and ESMF initiatives.

Some shifts in perspective will be needed to achieve the CCSM's software goals.  To date, CCSM software development has been largely an ad hoc, loosely organized, and scientist-managed process.  It is essential to the future success of the CCSM project that software development mature into a more professional activity.  This will require dedicated software engineering management and resource allocations for functions such as code maintenance, software management tools, and technical support.  Like other professionally maintained software projects, CCSM software engineering must keep pace with technological change through continued staff development and associated research efforts to evaluate and adapt new technologies.  CCSM-related pilot projects that engage computer scientists, specialists in numerical methods, and software engineering experts must be viewed as a critical aspect of ongoing code development.

A sensible approach to software management and the introduction of systematic practices should not create software development roadblocks or discourage scientific experimentation with the code.  However, the long-term success of the CCSM project will require an initial investment into infrastructure development.  The effort required to create this infrastructure is likely to pay off in coordinated, rapidly developed, and robust code.  It will also provide a basis for the coordinated multi-institutional development that is needed for the ACPI and ESMF initiatives.

Another shift in perspective that will reinvigorate CCSM software development is an alignment with commercial software technologies.  The technological boom of the 1990's drew many first-rate software engineers and computer scientists into commercial software development.  The CCSM project must exploit the richness of talent and development resources driving the world-dominant U.S. computer and software industries to not merely match foreign climate modeling efforts but to surpass them.  Part of this alignment process will involve increasing the usage of mainstream C-based languages in the CCSM.  By doing this it will be possible to use compilers strongly backed by industry, utilize a broader range of software support tools, and hire staff from a larger applicant pool.  This does not necessarily mean that physics parameterizations will need to be recoded; it should be possible to introduce C-based code somewhat unobtrusively through the development and use of non-scientific C-based utilities (netCDF is a good example).

The final shift in perspective is focusing overall on a modular, layered software design that allows code to be run reasonably efficiently on a range of platforms, rather than using data structures that are optimized for a single architecture type.  This flexible approach is necessitated by significant differences in optimization strategies on various platforms, the complexity of current platforms, the need for scalability on distributed memory architectures, and the volatility of the current high performance computing environment.  It does not preclude the development or use within the CCSM of high-performance codes optimized for specific architectures.

The key software-related issues facing the CCSM project may be summarized as:ema

The strategies proposed in this document are as follows: We suggest that updated versions of this plan be released on a yearly basis.

We strongly encourage a review of the CCSM software engineering status and this document by independent outside consultants such as those available through the Software Engineering Institute (SEI).

1 Introduction

This document is a software engineering plan for the Community Climate System Model (CCSM).  The primary motivation for this plan to identify a strategy for the CCSM to meet the wide variety of goals outlined in the Community Climate System Model Plan 2000-2005.

The CCSM Plan 2000-2005 (hereafter referred to simply as the CCSM Plan) was released to the climate modeling community at the Fifth Annual CCSM Workshop in June, 2000.  The plan discusses the history and current status of the Community Climate System Model, the current status of the modeling system (including perceived shortcomings of the component models), and stated research and development goals for the CCSM during the period 2000-2005.

According to the CCSM Plan, the broad, long-term goals for the development of the CCSM are:

  1. to develop and to work continually to improve a comprehensive CCSM that is at the forefront of international efforts in modeling the climate system, including the best possible component models coupled together in a balanced, harmonious modeling framework;
  2. to make readily available to, and usable by, the climate research community in the ongoing process of model development;
  3. to use the CCSM to address important scientific questions about the climate system, including global change and interdecadal and interannual variability; and
  4. to use appropriate versions of the CCSM for calculations in support of national and international policy decisions.
All of the goals listed above either explicitly or implicitly state the need for the CCSM to run very efficiently on computers currently available to the U.S. climate community.  The relevant measure of efficiency is throughput; that is, the number of model years of integration that can be performed per calendar day.  The desire for active participation by the climate research community in model development implies the need for carefully documented, reliable, and lucid source code that can be easily configured, compiled, run, modified, and extended by scientists.  Attaining a high throughput, highly extensible model will require significant code restructuring.

Much of the code rework described in this document will be undertaken via two initiatives.  The first is the Accelerated Climate Prediction Initiative (ACPI) Avant Garde project, an ongoing 18-month NCAR/DOE collaboration that focuses on redesign of the CCSM atmospheric component and flux coupler.  The second initiative is a response to a still-to-be-released NASA High Performance Computing and Communications Cooperative Agreement Notice (NASA HPCC CAN) that calls for the creation of an Earth System Modeling Framework (ESMF).  The CCSM is a proposed testbed for the ESMF project.  These initiatives are described in more detail in Section 5.

The CCSM Software Engineering Plan we have developed to implement the CCSM Plan is consistent with many of the recommendations in the Report of the NCAR Code Assessment Panel and the subsequent NCAR Strategic Plan for Scientific Simulation.

2 Software Description

The CCSM is a continually evolving comprehensive model of the climate system, including both physical and biogeochemical aspects.  It is composed of four independent software components that model geophysical systems: atmosphere, ocean, land surface, and sea ice.  These communicate pairwise via a flux coupler using message passing.  The flux coupler must interpolate and average between the different grids of the component models while conserving local and integral properties.

The current version of the CCSM was originally called the Climate System Model (CSM-1).  The atmosphere component is NCAR's Community Climate Model version 3 (CCM3).  The current production ocean model is the NCAR CSM Ocean Model (NCOM).  Within the next six months, the Los Alamos National Laboratory (LANL) Parallel Ocean Program (POP) model will be moved out of development and into production. The sea ice component, the CSM Sea Ice Model (CSIM1) will be replaced by an extension of the Los Alamos CICE model. The land component is currently the Land Surface Model (LSM), developed at NCAR.

The CSM-1 code consists of 180,000 lines of FORTRAN code divided as shown in Table 1.

Table 1.  Composition of the CCSM

Relative Size Model Component Version
49% Atmosphere CCM 3.6.6
27%
Ocean NCOM 1.5.0
11%
Land Surface LSM 3.6.6
7%
Sea Ice CSIM 2.2.9
6%
Coupler CPL 4.0.5

A new version of the model, called the CCSM-2, is being assembled during fall of 2000 with an algorithm freeze goal of December, 2000. Fully coupled tuning runs will begin January, 2001, with the CCSM-2 released to the scientific community at the CCSM workshop in June, 2001. CCSM-2 will run on distributed memory machines such as the IBM SP and Compaq ES clusters, as well as shared memory SGI Origin platforms.

CCSM code development at NCAR is supported by seven software engineers and associate scientists. The current CSM-1 version is targeted at Cray and SGI Origin 2000 platforms.

3 Software Goals

The CCSM Plan outlines a number of scientific objectives for the CCSM.  These scientific objectives imply the following software goals: , performance portability, reusability, extensibility,; interoperability with component models other than those in the standard distribution, robustness, and ease of use.  Below we give further explanations of these desirable properties.

Performance Portability
Past CCSM software development has targeted vector platforms and shared memory platforms with relatively modest attention devoted to parallel performance issues.  As the availability of vector platforms to U.S. researchers has waned, a different approach is in order.  During the period covered by this plan, the main hardware options available to NCAR and the U.S. climate community are likely to be microprocessor-based, distributed-memory computers and hybrid systems incorporating message passing between shared-memory multiprocessor nodes.  Such platforms have complex memory hierarchies, and memory bandwidth and latency issues dominate performance.  For the CCSM to meet the goals of the 2000-2005 CCSM Plan, first priority must be placed on making the CCSM run efficiently on these platforms.

Through overseas collaborations CCSM code is in fact likely to be run on both vector and RISC-based machines.  Ideally, the code would have the flexibility to run efficiently on both.  In practice, this can be difficult to achieve since the choice of optimal data structures, loop ordering, and other significant design decisions may differ depending on whether code is intended for vector or RISC systems.

Since CCSM software engineers do not have access to vector machines as development platforms, it is problematic to consider efficient performance on these platforms as a  requirement in the near future.  However, it is a priority to write flexible code, and whenever possible, to use data structures that are likely to achieve acceptable performance on either architecture.  For excellent performance on a given architecture, this can add considerable complexity, and for some portions of the model, developing two code versions, one scalar and one vector, may be the most practical solution.  The availability of development resources may in part dictate the extent to which efficient performance on both types of architectures can be realized.

Reusability
The CCSM component models and coupler currently share little code for utilities, such as error handling, performance timing, and input/output (I/O).  Reusing code is a goal because it saves development and maintenance time, and it can improve code quality through collective testing and optimization.

Extensibility
Future applications of the CCSM, such as the "Flying Leap Experiment," will involve incorporation of biogeochemistry and new component models, and future versions of the CCSM will certainly include new process parameterizations.  These applications and enhancements will proceed much more rapidly if the CCSM software is extensible.  Extensibility is largely accomplished by developing modular code with well-defined interfaces.

Interoperability
Projects like the Atmospheric Model Intercomparison Project (AMIP) and the Coupled Model Intercomparison Project (CMIP) create widespread interest in the effect of using different dynamical cores and different component models.  The CCSM should provide the means to use different dynamical cores or component models with relative ease.

Robustness
We use the term `robustness' to indicate two things:  1) high-quality and dependability and 2) support for error handling.  The CCSM's role as a community model makes both of these vitally important.  Many members of the climate community use the CCSM as a framework in which to test process parameterizations, while others use the CCSM with relatively little modification.  In both cases, the CCSM user has every right to expect that if a model run fails in a portion of the core CCSM code, the failure condition and location will be (when possible) identified by the CCSM.

Ease of Use
In the interest of accelerating the research process, the CCSM should be easy to configure and compile on supported platforms, and it should be easy to initialize and run.  Developers and users who wish to modify or add code to the CCSM need to do so with relative ease.

4 Software Strategy

In outlining this plan we rely on common sense, past experience with large software projects, and primarily commercial and government sources.  We consider greater alignment with these last two sectors to be a key element in revamping CCSM software for two reasons.  First, many of the best-established, proven methods for managing complex software projects originated in industry and the military (e.g., the classic Code Complete (1993), the NASA Software Engineering Laboratory Series).  An investment in adopting such standard development strategies, such as peer reviews, defect tracking, and regular system builds, is probably of more immediate benefit to the CCSM project than research into cutting-edge computer science, and in any case is likely to be a prerequisite for the successful integration of advanced research into the model.

Second, the technological boom of the 1990's has drawn much of the talent and innovation in computer science and software engineering into the commercial sector.  The activities of industry have become increasingly relevant to efforts like the CCSM in a number of ways.  Getting efficient performance on microprocessor-based systems is a commercially valuable skill.  Teams who formerly worked on distributed scientific computing have moved wholesale to related problems in distributed web technologies.  Linux clusters are viable high-performance platforms.  Analogues of commercial tools such as CORBA are being emulated in a scientific context.  The need for a flow of ideas and technologies up and down the Branscomb pyramid, which has at its base the multitude of desktop computers and at its tip the relatively few supercomputers, drives current initiatives, such as the Common Component Architecture project.

We note that the correspondence between military and commercial strategies is deliberate.  The Report of the Defense Science Board Task Force on Acquiring Defense Software Commercially (1994) identified the following as the primary causes of trouble in DoD software projects:

These items could also describe the software environment of the CCSM project.  The primary recommendations of the report were: There are some indications that this has been a successful strategy for the DoD community.  The 1996 Air Force Guidelines for Successful Acquisition and Management of Software-Intensive Systems describe projects that have adopted more systematic development approaches and achieved time and cost savings.  Efforts to develop high-performance, reusable, C-based software infrastructures for applications of military interest like signal processing are maturing (see www.vsip.org).  The outcome of the proposed recommendations should be researched further, but this may be difficult due to the closed nature of many military projects.

Assuming that this would be a sensible strategy for the CCSM project, how could one go about implementing it without disrupting scientific productivity?

A significant opportunity for achieving greater alignment with the commercial sector is the formation of a focused software engineering team to develop infrastructure usable by the CCSM.  Separating non-scientific code from research code allows a software engineering team to work on a relatively stable problem that can be solved in the manner of their choosing (as long as it meets scientific requirements).  Therefore, such a team can be a testbed for new practices, tools, design strategies, and implementation languages.  A team of this sort has been proposed for the ESMF effort.  Since the infrastructure software is intended to serve a wide variety of projects, the software team does not need to be part of the CCSM group itself.

4.1 Software Management Plan

Currently the CCSM project has no software engineering manager.  Coordination and management of software is accomplished partly through the efforts of the project's lead scientists and partly through the efforts and interaction of individual software developers.  This approach has sustained the project so far, but it has very serious drawbacks.  It requires scientists to manage software processes that are outside their interests and areas of training, in addition to their research workload.  The competing demands on scientists' time sometimes means that software management is justifiably overlooked.  The result is that software procedures are not well established or documented, that information related to software issues propagates erratically throughout the project, and that coordination of the multiple, geographically distributed development teams working on the CCSM is difficult.  The newly formed CCSM Software Engineering Working Group (SEWG) is an important step towards increasing the visibility of software issues and helping to coordinate multiple development efforts, but it does not perform day-to-day project management.  A day-to-day software manager or coordinator is needed.

Studies reflect what common sense suggests.  In the book Patterns of Software Failure and Success (1996), Jones analyzed thousands of software projects in multiple domains (systems software, military software, commercial software, others)  and came to the following conclusion:

"It is both interesting and significant that the first six out of sixteen technology factors associated with software disasters are specific failures in the domain of project management, and three of the other technology deficiencies can be indirectly assigned to poor management practices."

Even if the CCSM's non-scientific infrastructure code is developed by a focused, separately managed software team, the need for clearly identified software engineering management within the CCSM project remains.

Adequate software engineering management extends beyond simply hiring a software engineering manager.   Complex software projects such as the CCSM may also require management support positions, such as a "gatekeeper" or "librarian" who oversees configuration management of the project.  Duties may include monitoring check-ins, ensuring that the appropriate testing and validation occurs, setting up automated tests, and coordinating new releases.  Support for such a position is requested in the ACPI Avant Garde proposal.

Software management milestones:

4.2 Software Restructuring (updated June 2001)

4.2.1 Atmosphere

The CSM-1 atmospheric model is currently undergoing major code rework as part of the ACPI Avante Garde initiative (see Section 5.1). Code restructuring will be performed incrementally with three phased developments.  The first stage is a design study, remapping of data structures, and restructuring of the code for the calculation of physical parameterizations.  Second, the dynamical core interface will be formalized and implemented.  This work will make it possible for parallel implementations of different dynamical cores to proceed independently.  The third phase will be the parallel development of three high-performance dynamical cores and integration with the atmospheric model.  The dynamical cores will be Eulerian-Spectral, Semi-Lagrangian-Spectral, and Lin-Rood.

Other work on the atmospheric component includes short-term scientific improvements over the CSM-1 atmosphere, such as explicit treatment of the sulfate aerosols and cloud liquid water, as well as time-split physics and significant upgrades to the short and long-wave radiation codes. The vertical resolution will be increased from 18 to 26 levels and the capability for reduced horizontal grids has been added. The history output will be in netCDF format.

Column physics: flexible data decomposition
The flexible parallel decomposition of the physics parameterizations is key to performance portability of the atmospheric model.  Since each vertical column of the atmosphere can be computed independently, there is an opportunity to exploit natural parallelism.  The more computationally intensive the physical parameterizations included in the model, the more this parallelism is likely to offset the communication costs that are associated with data distribution.

The data structure currently used in physics calculations is an entire latitude slice with the end index variable to accommodate a reduced grid.  This choice is excellent for performance on vector machines but yields poor performance on machines with modest caches.  A rewrite will change the parameterization package, so that the data structure is an arbitrary collection of columns.  In addition to making the code cache-friendlier, it increases its scalability.  Decomposition specification will be in terms of both data distribution and shared memory multitasking, so that scientists do not need to multitask individual parameterizations.  Since the decomposition allows column sets that are not spatially contiguous, static load balancing for radiation calculations will be supported; this is accomplished by grouping columns that lie opposite each other on great circles.

Swappable dynamics
This work will increase the modularity of the atmospheric model by clearly encapsulating the dynamical cores.  Incorporating this strategy in the initial design is important for three reasons.  First, by clearly defining the interface in the atmospheric model, code optimizations and parallel decomposition strategies for the dynamics can be developed independently and optimally of other components (e.g., physical parameterizations).  Second, parallel constructs will be more isolated so that the model can be readily ported and adapted to new platforms.  Third, research into new dynamical cores can be accelerated, since multiple groups can develop codes simultaneously.  The set of dynamical cores developed (Eulerian-Spectral, Semi-Lagrangian-Spectral, and Lin-Rood) will utilize hybrid OpenMP/MPI parallelism.  Both the column physics rework and the swappable dynamics capability will be implemented in the 18-month time frame of the ACPI proposal.

In the longer term, we plan to modify the interface of component models so that they can use the ESMF coupling mechanism.  I/O and other utilities will be replaced as appropriate by those supplied by ESMF.  We anticipate that the increased code modularity provided by the ACPI team will make it easier to adapt to the future framework, both at the levels of basic utilities and component coupling.

4.2.2 Ocean

The CCSM-2 ocean component uses the POP 1.4 code on a displaced Greenland pole grid. All the physics packages from the previous ocean component have been coded into POP, including an optimized version of the Gent-McWilliams (GM) parameterization of mesoscale eddy effects and the Large et al. k-profile parameterization vertical mixing scheme. An improved treatment of the equation of state has been added, as well as variable surface layer thickness options, a marginal-seas balancing scheme, and a new time manager. Additional options have been added to the GM eddy parameterization. The eddy advection and isopycnal diffusion parameters can take different values, and these parameters can vary spatially according to the specification of Visbeck et al. Development of partial bottom cells for complex topgraphy is entering the testing phase, and a bottom boundary transport scheme will also be tested in the near future. The model supports two resolutions: a coarse 3.6' x 2' x 25 level grid for paleoclimate applications and a higher resolution 1' x 1/2' x 40 level grid for current/future climate studies. This work is being shared with the POP developers at the Los Alamos National Laboratory. This work should be completed in the next year.

4.2.3 Land

The CCSM2-land model will contain the essential biogeophysics of the Common Land Model (CLM) with some modifications such as use of the LSM vegetation albedos, the burial of vegetation vertically rather than horizontally and the linking of photosynthesis and stomatal conductance. Biogeochemistry is not currently implemented. The near term goals are to incorporate treatments of biogeochemistry, dust emission, vegetation dynamics and carbon cycling. Vegetation dynamics implies that the land cover is no longer prescribed but rather predicted. This will involve significant changes to the data structures of the code.

4.2.4 Sea Ice

The CCSM2 Sea Ice Model development is basically complete based on goals set by the CCSM Polar Climate Working Group. Several improvements have been made to the Sea Ice Model (CSIM4) in the past year. Cleaning up the code was a high priority and has been completed. This included removing and combining redundant subroutines and modifications to use F90 modules. It was decided that Hunke's Elasto-Viscous Plastic (EVP) rheology and Bitz's sea-ice thickness and thermodynamics distribution would be included as part of the default physics. Lipscomb's linear remapping scheme has been added to better resolve the thin end of the ice thickness distribution (ITD). A prescribed ice model was added for atmospheric tuning. An ocean mixed layer model is currently being added to CSIM4. Further testing needs to be done to find remaining bugs and resolve coupling issues.

Future plans include the addition of metric terms to the EVP dynamics and improvements to the subgrid scale processes such as ice/ocean interaction, melt ponds and surface albedo. A new horizontal remapping scheme is being evaluated at LANL as a possible replacement for MPDATA; Lipscomb is currently testing the Dukowicz-Baumgardner scheme in CSIM4. Model sensitivity to atmospheric forcing, model physics and parameters needs to be examined. Code documentation will be updated and a User's Guide will be written.

4.2.5 Coupler

Coupler development will proceed on two tracks. The first, which has a shorter term focus, will deliver the CCSM coupler version 5. This involves a DOE/NCAR team optimizing the existing CCSM coupler (version 4) as needed to support the performance requirements for a 2001 release of a CCSM2.0 coupled system. The desired overall throughput is 5 years/day as a minimum with 10 years/day preferred. The second, longer term track, the DOE/NCAR team will design and implement a next generation coupler, CCSM coupler version 6. A version 6 coupler that functions sufficiently to replace the version 5 coupler will be available by the end of 2001. The DOE/NCAR team will collaborate with an Earth System Modeling Framework (ESMF) initiative as appropriate.

The version 6 CCSM coupler is intended to have the following properties:

The DOE/NCAR effort is scheduled for completion at about the same time that the ESMF team is finishing up its requirements and design phase. Ideally DOE/NCAR effort can accelerate the design and implementation of the ESMF coupling strategy, providing an early prototype that meets many of the requirements of the ESMF. To effect this coordination, key members of the coupler team will participate in ESMF design activities.

4.2.6 Utilities

Communication
We plan to coordinate the development of communication libraries with multiple development teams.  Both the ACPI team and the ESMF collaboration have proposed similar schemes.  As with the coupler, the ACPI activity can be used to develop early prototypes of the code desired for the future ESMF framework.

I/O
The ACPI effort will investigate various ways to introduce parallel I/O into the CCSM in the short term.  A joint CCSM/Unidata proposal (see Section 5.4), if funded, employs the CCSM as a testbed for a new, parallel version of netCDF layered on HDF-5.

Higher level I/O functions will be provided by the ESMF framework.  A current project is investigating the C-based coding strategy proposed for the ESMF utilities (see Section 5.3), and a set of basic utilities has been implemented and partly tested in the CCM and LSM codes.

Calendar
A calendar that supports a leap year option was implemented and tested in the CCM and LSM codes.  Its incorporation into the CCSM source code is pending testing on additional platforms.

Software restructuring milestones

4.3 Software Practices

The CCSM has recently begun to increase the formality of its software development process.  In the work undertaken by the DOE/NCAR team for the ACPI project, coding has been preceded by requirements definition and review, and several design documents are underway.  There is a proposed set of coding standards for CCSM Fortran 90 development.  However, the CCSM group lacks systematic, documented practices for many aspects of software development.  For example, the project does not have a documented software design process, a unified strategy for code repositories for distributed development groups, a standard set of code testing and build procedures, or an overall defect-tracking system.  The CCSM software process lacks many of the nine most effective practices identified by a large community of software experts as a result of the 1994 Software Best Practices Initiative, such as risk analysis, metric tracking for project planning, and project-wide visibility of a software plan (see Appendix C).

We can roughly categorize the CCSM software process by examining the Capability Maturity Model (CMM(tm)) for Software developed by the Software Engineering Institute at Carnegie-Mellon.   The CMM judges the maturity of the software processes of an organization and identifies the key practices that are required to increase the maturity of these processes.  Defined maturity levels are described in Appendix D.  The lack of defined management and many formal procedures indicates that the CCSM process is still in the "initial" (ad hoc/chaotic) stage, though the recent introduction of peer reviews and coding standards suggests that it is acquiring characteristics of the second "repeatable" level.

Gains in productivity and software quality are associated with adopting more systematic processes, and the CCSM will continue to do so.  However, it is essential that practices be carefully adapted for the working habits of scientists and the distinctive, research-driven NCAR environment.  There are limits to how much code formality it is practical to adopt.  For general, non-scientific utilities developed by software engineers, it makes sense to introduce rigor into the specification, design, and testing process.  This helps to ensure that software engineers deliver robust software that meets requirements.  Efficiency can be encouraged by following practices, such as producing design documentation that can be readily converted to user documentation and allowing portions of this documentation to be automatically generated from source code.

Coding scientific parameterizations is a different matter.  It is probably unreasonable to demand requirements definition and reviews for this type of code; it changes rapidly and scientists should not be encumbered by the software engineering process.  However, it is reasonable that scientific code be checked by software engineers for appropriate error handling, code format, etc., and, if appropriate, optimized for performance before being integrated into the CCSM production code.

In the short term, we plan to focus the effort to systematize software development by using the CCSM Software Engineering Working Group (CCSM SEWG) to foster discussion and agreement on accepted practices.  We plan to document procedural conventions in a CCSM Software Developer's Guide, and work is already underway.  We have posted sample contents for comment on the SEWG website (http://www.cesm.ucar.edu/working_groups/Software), and we are in the process of collecting references and information on current practices.  Sources for software practices include the classic Code Complete (1993) and the Manager's Handbook for Software Development and other documents from the NASA Software Engineering Laboratory.

Once a CCSM software engineering manager position is established, procedures can be reassessed.  The manager can help to encourage adherence to guidelines.

Staff support for new practices and techniques will be encouraged by training.  Staff development opportunities should be increased.

Software practices milestones:

4.4 User Support

4.4.1 Technical support
Since the CCSM is a community model, it is essential that the project identify a designated primary contact for software engineering technical support and a trouble-ticket mechanism to track reported problems.  Once this contact is established, the CCSM documentation and website must clearly indicate how to report technical problems.

4.4.2 User's Manual
There is currently minimal documentation on how to run the overall CCSM.  We propose the creation of a User's Guide that explains the installation procedure and summarizes options for configuring the model.  The team preparing the User's Guide should be a mix of junior and senior software engineers and support personnel, so that the time senior software engineers spend on the effort is minimized.

4.4.3 Graphical User Interface
As options for configuring the CCSM increase along with scientific capabilities, the initialization procedure for the model will grow more complex.  A web-based Graphical User Interface (GUI) can help users understand and select options for running the model and track model progress while it is running.  It could eventually be used to interactively steer model runs.  Any GUI developed must be highly extensible, so that it does not need to be substantially modified when changes are made to CCSM code.

A pilot project to develop such an interface for the CCSM would be of enormous value to its community of users.  It is probably not appropriate to focus development resources on such a pilot project until some of the major current code restructuring activities are completed.  We note that basic GUI technology for technical applications is well-established in such areas as real-time systems and that expertise already exists at NCAR in such groups as the Atmospheric Technology Division (ATD).

User support milestones:

4.5 Timeline

The following is a tentative timetable for the milestones listed in this section.  Other major events related to CCSM software development are indicated in boldface.
 
Date
Milestone
Fall/Winter 2000
  • Release of NASA HPCC CAN
  • CCSM algorithm freeze December 2000
  • Creation of a software engineering manager position and management support positions for the CCSM project
  • Coupler:  optimized version of current CSM-1 coupler
  • Ocean: optimized barotropic solve
  • Identification of a primary contact for CCSM technical support
  • Software Engineering Working Group discussion of improved, more systematic practices
  • Documentation of better practices in a CCSM Software Developer's Guide
  • Refinement and enforcement of practices by software engineering manager
  • Increased opportunities for staff development (ongoing)
Spring 2001
  • Testing of CCSM in preparation for summer 2001 code release
Summer 2001
  • Release of CCSM-2
  • Atmosphere: column physics implemented with flexible decomposition
  • Atmosphere:  three swappable dynamical cores developed
  • Introduction of common utility libraries for functions, such as communication, I/O, timing, error handling via the ACPI project
Fall/Winter 2001
  • ACPI Avante Garde project ends December 2001
  • "Next generation coupler" implementation
  • Incorporation of enhanced parallel I/O
  • Creation of an overall CCSM User's Guide
  • Introduction of a GUI development pilot project 
Summer 2003
  • Projected first implementation milestone for NASA HPCC CAN
  • Adoption of prototype ESMF coupling software in CCSM
  • Adoption of prototype ESMF common utility software in CCSM

5 CCSM-Related Software Projects and Initiatives

A summary of initiatives and collaborating institutions is provided in Appendix A.

5.1 DOE "ACPI Avant Garde" Project

The Accelerated Climate Prediction Initiative (ACPI) Avant Garde project is a joint Department of Energy/National Science Foundation project.  The goals of the project are: 1) where readily possible, improve performance of the existing CCSM and 2) work with the CCSM development team to design and build a more performance-portable, modular CCSM.  This 18-month project commenced June 1, 2000.

The ACPI work focuses on two components of the CCSM that offer ample opportunities for performance optimization:  the flux coupler and the atmospheric component.  In the atmospheric component, work will focus on improving on-node performance, developing a more modular structure that permits the exchange of alternative dynamics and physics components, and developing high-performance communication libraries.  Work on the coupler will address issues of on-node performance, scalability, and configurability.  The DOE/NCAR team also will work on improving the ocean component and support for high-performance, parallel I/O.

In addition to code restructuring, the ACPI project emphasizes a structured approach to code development.  The five basic elements of the ACPI strategy are:

5.2 NASA HPCC Round 3 Earth System Modeling Framework Project

The third phase of the NASA High Performance Computing and Communications (HPCC) program focuses on high-performance frameworks.  A Cooperative Agreement Notice (CAN) has not yet been released, but a draft version has been available since December, 1999.  The CAN is expected to call for the creation of an Earth System Modeling Framework (ESMF) by the Earth science community.  NCAR has been designated the lead institution, and the CCSM will be a primary testbed for the new framework.  Several preliminary meetings, the most recent on August 7-9, 2000 at NCAR, have established the collaborating institutions, a set of application requirements, and a preliminary architecture.

The ESMF will offer at minimum the following functionality: gridded data decomposition and load balancing, communication including synchronization, I/O, and grid transforms. These capabilities can be divided into two main functional areas.

The ESMF will be modeled on existing efforts, such as GEMS from NASA/NSIPP and FMS from NOAA/GFDL.  The collaborating institutions plan to use the robust, low-level infrastructure developed at Argonne National Laboratory for the PETSc project.

5.3 I/O Library

Currently, all CCSM components, including the flux coupler, implement different I/O handlers.  Much of the code is redundant and could be consolidated into a single library.  A collaboration between NCAR SCD/CSS and the CCSM project began in winter, 1999 with the objective of developing an I/O library that all CCSM components could share.  The longer-term objective of the library is the creation of a very general purpose I/O library usable by multiple models.

Requirements were completed in March, 2000 and a preliminary design specification released in July, 2000.  Development is proceeding from low-level utilities up to higher-level functions, such as field abstractions and time averaging.  The library is being implemented in object-based C and is utilizing the infrastructure from the PETSc library developed at Argonne National Laboratory.  The PETSc infrastructure offers signal and error handling, C to f90 pointer conversion, C/f90 character string conversion, and other useful features.  A set of basic classes that represent data sets, files, storage, and case description were built using these tools and integrated into test versions of the CCM and LSM codes.

A calendar manager was written as an early prototype of this coding technique, and it has also been integrated into the test version of CCM and LSM codes.  Integration into the CCSM source is pending validation on additional platforms (the code to date has only been validated on the IBM).

This I/O library effort will be a prototype for the I/O and utilities in the Earth System Modeling Framework.  The continued development of the I/O library will be strongly influenced by the preliminary design of the ESMF.

5.4 Parallel NetCDF

A joint Unidata, NCSA, NCAR/SCD, NCAR/MMM, and NCAR/CGD proposal entitled Merging the NetCDF and HDF-5 Libraries to Achieve Gains in Performance and Functionality was submitted to the NSF ITR program in February, 2000.  The goal of the project is to develop a high-performance version of netCDF that is based on the HDF-5 library.  The new version would have a parallel write capability and features for packing and extending a data set in multiple dimensions.  The CCSM is a proposed testbed, and, if the project is funded, staff will be provided to help integrate parallel netCDF into the CCSM source code.

6 Summary

In this plan we have identified several shifts in perspective necessary to achieve the software goals implied by the CCSM Plan.

These are:

We have described some concrete examples of how the CCSM has started to move in these directions and how it plans to proceed.  A projected timeline charts the anticipated progress to be made during the next few years.  A critical next step will be evaluating the resources or resource redistribution that will be necessary for carrying out this plan and ensuring that they are available to the CCSM project.

Appendix A

Summary of CCSM-Related Projects and Collaborations


Project/Initiative Description         Collaborators  Status
ACPI Avant Garde Project 18-month project focused on CCSM coupler and atmosphere model with some development of parallel I/O, ocean model optimization and communication 
  • NCAR
  • Argonne National Laboratory
  • Lawrence Berkeley National Laboratory
  • Los Alamos National Laboratory
  • Lawrence Livermore National Laboratory
  • Oak Ridge National Laboratory
Began June 1, 2000
Earth System Modeling Framework (ESMF)/NASA HPCC Cooperative Agreement Notice (CAN) 3-year project will focus on a flexible, general coupling strategy and the development of a utility tool kit for modeling
  • NCAR
  • Argonne National Laboratory
  • NASA/GSFC
  • NOAA/NCEP
  • NOAA/GFDL
  • MIT
  • University of Michigan
  • Los Alamos National Laboratory
Cooperative Agreement Notice release is expected mid-September, 2000.  Initial draft of ESMF design document is scheduled for late September, 2000
I/O Library Ongoing project that is currently prototyping implementation of an I/O handler similar to that proposed for the ESMF
  • NCAR/SCD, NCAR/CGD

  • NCAR/SCD, NCAR/CGD
Work will merge with ESMF project


Appendix B

Summary of Proposed CCSM Software Engineering Documents


Document Description Status
CCSM Software Engineering Plan Outlines a technical strategy for achieving the goals of the CCSM Science Plan, describes the coordination of multiple development activities Draft available
CCSM Developer's Guide Describes the conventions associated with specifying, designing, implementing, testing, and maintaining code for the CCSM In progress; sample contents proposed and information being collected
CCSM Users's Guide Describes how to install and run the CCSM and summarizes usage options Newly proposed
Requirements and Design Documents Specifies in detail the requirements, interfaces, and functions of individual CCSM components Several requirements documents completed and reviewed


Appendix C

The Software Best Practices Initiative

"The Software Best Practices Initiative (1994) represents the collective efforts of nearly 200 development and maintenance expert practitioners from the commercial and government world, industry leaders, software visionaries and methodologists ... Seven panels studied successful software programs in the public and private sectors to determine those practices characteristic to all programs and significant leverage items for success." (Air Force Guidelines for Successful Acquisition and Management of Software Intensive Systems, June, 1996)

The 9 "best practices" are:

Formal risk management
Risk management involves continuously updating and monitoring risk plans to account for new, potential and manifest risks.

Agreement on interfaces
To address the chronic problem of vague, inaccurate, and untestable specifications, a baseline user interface should be agreed upon across affected areas before beginning implementation activities.

Metric-based scheduling and tracking
Statistical quality control of costs and schedules should be maintained.

Defect tracking
Defects should be tracked formally during each project phase.

Project-wide visibility of project plan and progress versus plan
The core indicators of project health or dysfunction should be made available to all project participants.

Configuration management
Considered essential to any software development project.

Inspections, reviews, and walkthroughs
Peer reviews should be conducted at all design levels (particularly detailed designs), on code prior to unit test, and on test plans.

Quality gates
Completion events should be in the form of "gates" that assess the quality of the product produced or the adequacy and completeness of the finished process.

People-aware management
Management must be accountable for staffing qualified people, as well as for fostering an environment conducive to low staff turnover.

(This has since been updated to 16 best practices, but these are the basics.)


Appendix D

Capability Maturity Model (CMM) for Software

(See http://www.sei.cmu.edu/cmm/)

Level 1 - Initial. The software process is characterized as ad hoc, and occasionally even chaotic.  Few processes are defined and success depends on individual effort and heroics.  Key challenges:  project management, project planning, configuration management, software quality assurance.

Level 2 - Repeatable. Basic program management processes are established to track cost, schedule, and functionality.  The necessary process discipline is in place to repeat earlier successes on programs with similar applications.  Key challenges:  training, technical practices (reviews, testing), process focus (standards, process groups). Level 3 - Defined. The software process for both management and engineering activities is documented, standardized, and integrated into a standard software process for the organization.  All programs use an approved, tailored version of the organization's standard process for developing and maintaining software.  Key challenges:  process measurement, process analysis, quantitative quality plans. Level 4 - Managed.  Detailed measures of the software process and product quality are collected.  Both the software process and products are quantitatively understood and controlled.  Key challenges:  changing technology, problem analysis, problem prevention. Level 5 - Optimized.  Continuous process improvement is enabled by quantitative feedback from the process and from piloting innovative ideas and technologies.  Key challenges:  still human-intensive process, maintain organization at optimizing level.