November 2000
Component Model Plans: updated June 2001
Prepared by:
Cecelia DeLuca (NCAR)
J. Walter Larson (Argonne National Laboratory)
Lawrence Buja (NCAR)
Anthony Craig (NCAR)
John Drake (Oak Ridge National Laboratory)
Table of Contents
Executive Summary
1 Introduction
2 Software Description
3 Software Goals and Requirements
4 Software Strategy
4.1 Software Management
4.2 Software Restructuring
4.3 Software Practices
4.4 User Support
4.5 Timeline
5 CCSM-Related Software Projects and Initiatives
5.1 DOE ACPI "Avant Garde" Project
5.2 NASA HPCC Earth System Modeling Framework
5.3 I/O Library
5.4 Parallel netCDF
6 Summary
Appendices
Appendix A: Summary of CCSM-Related Projects and Collaborations
Appendix B: Summary of Proposed CCSM Software Engineering
Documents
Appendix C: The Software Best Practices Initiative
Appendix D: The Capability Maturity Model for Software
Executive Summary
NCAR's Community Climate System Model (CCSM) is critical to the nation, both as a climate change assessment tool and as a research tool for studying wide-ranging aspects of the Earth system. The CCSM software must manage the complexity of the intertwined physical and dynamical processes that it simulates, and it must extract efficient performance from today's complicated and transient supercomputer architectures. The software engineering challenges will only increase as scientists press for greater resolution, longer runs, and ensemble runs to achieve the goals outlined in the Community Climate System Model Plan 2000-2005.
In this document we present a plan to make the CCSM a more efficient, extensible, robust, and easy to use piece of software over the next five years. To accomplish these objectives, the areas of software management, code structure, software engineering practices, and user support will require coordinated changes. Some of these changes will occur through CCSM involvement in two large initiatives. The first of these, part of the Accelerated Climate Prediction Initiative (ACPI), is an ongoing 18-month NCAR/DOE collaboration that focuses on reworking the CCSM atmospheric component and coupling component. The second initiative has been motivated by a still-to-be-released NASA High Performance Computing and Communications Cooperative Agreement Notice (NASA HPCC CAN) that calls for the creation of an Earth System Modeling Framework (ESMF). The ESMF is intended to increase the reuse and interoperability of climate and weather codes by identifying a common non-scientific software infrastructure and developing it jointly through a multi-institutional collaboration. The CCSM is a proposed testbed for the ESMF project. One of the current challenges of the CCSM project is the coordination of the ACPI and ESMF initiatives.
Some shifts in perspective will be needed to achieve the CCSM's software goals. To date, CCSM software development has been largely an ad hoc, loosely organized, and scientist-managed process. It is essential to the future success of the CCSM project that software development mature into a more professional activity. This will require dedicated software engineering management and resource allocations for functions such as code maintenance, software management tools, and technical support. Like other professionally maintained software projects, CCSM software engineering must keep pace with technological change through continued staff development and associated research efforts to evaluate and adapt new technologies. CCSM-related pilot projects that engage computer scientists, specialists in numerical methods, and software engineering experts must be viewed as a critical aspect of ongoing code development.
A sensible approach to software management and the introduction of systematic practices should not create software development roadblocks or discourage scientific experimentation with the code. However, the long-term success of the CCSM project will require an initial investment into infrastructure development. The effort required to create this infrastructure is likely to pay off in coordinated, rapidly developed, and robust code. It will also provide a basis for the coordinated multi-institutional development that is needed for the ACPI and ESMF initiatives.
Another shift in perspective that will reinvigorate CCSM software development is an alignment with commercial software technologies. The technological boom of the 1990's drew many first-rate software engineers and computer scientists into commercial software development. The CCSM project must exploit the richness of talent and development resources driving the world-dominant U.S. computer and software industries to not merely match foreign climate modeling efforts but to surpass them. Part of this alignment process will involve increasing the usage of mainstream C-based languages in the CCSM. By doing this it will be possible to use compilers strongly backed by industry, utilize a broader range of software support tools, and hire staff from a larger applicant pool. This does not necessarily mean that physics parameterizations will need to be recoded; it should be possible to introduce C-based code somewhat unobtrusively through the development and use of non-scientific C-based utilities (netCDF is a good example).
The final shift in perspective is focusing overall on a modular, layered software design that allows code to be run reasonably efficiently on a range of platforms, rather than using data structures that are optimized for a single architecture type. This flexible approach is necessitated by significant differences in optimization strategies on various platforms, the complexity of current platforms, the need for scalability on distributed memory architectures, and the volatility of the current high performance computing environment. It does not preclude the development or use within the CCSM of high-performance codes optimized for specific architectures.
The key software-related issues facing the CCSM project may be summarized as:ema
We strongly encourage a review of the CCSM software engineering status and this document by independent outside consultants such as those available through the Software Engineering Institute (SEI).
1 Introduction
This document is a software engineering plan for the Community Climate System Model (CCSM). The primary motivation for this plan to identify a strategy for the CCSM to meet the wide variety of goals outlined in the Community Climate System Model Plan 2000-2005.
The CCSM Plan 2000-2005 (hereafter referred to simply as the CCSM Plan) was released to the climate modeling community at the Fifth Annual CCSM Workshop in June, 2000. The plan discusses the history and current status of the Community Climate System Model, the current status of the modeling system (including perceived shortcomings of the component models), and stated research and development goals for the CCSM during the period 2000-2005.
According to the CCSM Plan, the broad, long-term goals for the development of the CCSM are:
Much of the code rework described in this document will be undertaken via two initiatives. The first is the Accelerated Climate Prediction Initiative (ACPI) Avant Garde project, an ongoing 18-month NCAR/DOE collaboration that focuses on redesign of the CCSM atmospheric component and flux coupler. The second initiative is a response to a still-to-be-released NASA High Performance Computing and Communications Cooperative Agreement Notice (NASA HPCC CAN) that calls for the creation of an Earth System Modeling Framework (ESMF). The CCSM is a proposed testbed for the ESMF project. These initiatives are described in more detail in Section 5.
The CCSM Software Engineering Plan we have developed to implement the CCSM Plan is consistent with many of the recommendations in the Report of the NCAR Code Assessment Panel and the subsequent NCAR Strategic Plan for Scientific Simulation.
2 Software Description
The CCSM is a continually evolving comprehensive model of the climate system, including both physical and biogeochemical aspects. It is composed of four independent software components that model geophysical systems: atmosphere, ocean, land surface, and sea ice. These communicate pairwise via a flux coupler using message passing. The flux coupler must interpolate and average between the different grids of the component models while conserving local and integral properties.
The current version of the CCSM was originally called the Climate System Model (CSM-1). The atmosphere component is NCAR's Community Climate Model version 3 (CCM3). The current production ocean model is the NCAR CSM Ocean Model (NCOM). Within the next six months, the Los Alamos National Laboratory (LANL) Parallel Ocean Program (POP) model will be moved out of development and into production. The sea ice component, the CSM Sea Ice Model (CSIM1) will be replaced by an extension of the Los Alamos CICE model. The land component is currently the Land Surface Model (LSM), developed at NCAR.
The CSM-1 code consists of 180,000 lines of FORTRAN code divided as shown in Table 1.
Table 1. Composition of the CCSM
| Relative Size | Model Component | Version |
|---|---|---|
| 49% | Atmosphere | CCM 3.6.6 |
|
|
Ocean | NCOM 1.5.0 |
|
|
Land Surface | LSM 3.6.6 |
|
|
Sea Ice | CSIM 2.2.9 |
|
|
Coupler | CPL 4.0.5 |
A new version of the model, called the CCSM-2, is being assembled during fall of 2000 with an algorithm freeze goal of December, 2000. Fully coupled tuning runs will begin January, 2001, with the CCSM-2 released to the scientific community at the CCSM workshop in June, 2001. CCSM-2 will run on distributed memory machines such as the IBM SP and Compaq ES clusters, as well as shared memory SGI Origin platforms.
CCSM code development at NCAR is supported by seven software engineers and associate scientists. The current CSM-1 version is targeted at Cray and SGI Origin 2000 platforms.
3 Software Goals
The CCSM Plan outlines a number of scientific objectives for the CCSM. These scientific objectives imply the following software goals: , performance portability, reusability, extensibility,; interoperability with component models other than those in the standard distribution, robustness, and ease of use. Below we give further explanations of these desirable properties.
Performance Portability
Past CCSM software development has targeted vector platforms and shared
memory platforms with relatively modest attention devoted to parallel performance
issues. As the availability of vector platforms to U.S. researchers
has waned, a different approach is in order. During the period covered
by this plan, the main hardware options available to NCAR and the U.S.
climate community are likely to be microprocessor-based, distributed-memory
computers and hybrid systems incorporating message passing between shared-memory
multiprocessor nodes. Such platforms have complex memory hierarchies,
and memory bandwidth and latency issues dominate performance. For
the CCSM to meet the goals of the 2000-2005 CCSM Plan, first priority
must be placed on making the CCSM run efficiently on these platforms.
Through overseas collaborations CCSM code is in fact likely to be run on both vector and RISC-based machines. Ideally, the code would have the flexibility to run efficiently on both. In practice, this can be difficult to achieve since the choice of optimal data structures, loop ordering, and other significant design decisions may differ depending on whether code is intended for vector or RISC systems.
Since CCSM software engineers do not have access to vector machines as development platforms, it is problematic to consider efficient performance on these platforms as a requirement in the near future. However, it is a priority to write flexible code, and whenever possible, to use data structures that are likely to achieve acceptable performance on either architecture. For excellent performance on a given architecture, this can add considerable complexity, and for some portions of the model, developing two code versions, one scalar and one vector, may be the most practical solution. The availability of development resources may in part dictate the extent to which efficient performance on both types of architectures can be realized.
Reusability
The CCSM component models and coupler currently share little code for
utilities, such as error handling, performance timing, and input/output
(I/O). Reusing code is a goal because it saves development and maintenance
time, and it can improve code quality through collective testing and optimization.
Extensibility
Future applications of the CCSM, such as the "Flying Leap Experiment,"
will involve incorporation of biogeochemistry and new component models,
and future versions of the CCSM will certainly include new process parameterizations.
These applications and enhancements will proceed much more rapidly if the
CCSM software is extensible. Extensibility is largely accomplished
by developing modular code with well-defined interfaces.
Interoperability
Projects like the Atmospheric Model Intercomparison Project (AMIP)
and the Coupled Model Intercomparison Project (CMIP) create widespread
interest in the effect of using different dynamical cores and different
component models. The CCSM should provide the means to use different
dynamical cores or component models with relative ease.
Robustness
We use the term `robustness' to indicate two things: 1) high-quality
and dependability and 2) support for error handling. The CCSM's role
as a community model makes both of these vitally important. Many
members of the climate community use the CCSM as a framework in which to
test process parameterizations, while others use the CCSM with relatively
little modification. In both cases, the CCSM user has every right
to expect that if a model run fails in a portion of the core CCSM code,
the failure condition and location will be (when possible) identified by
the CCSM.
Ease of Use
In the interest of accelerating the research process, the CCSM should
be easy to configure and compile on supported platforms, and it should
be easy to initialize and run. Developers and users who wish to modify
or add code to the CCSM need to do so with relative ease.
4 Software Strategy
In outlining this plan we rely on common sense, past experience with large software projects, and primarily commercial and government sources. We consider greater alignment with these last two sectors to be a key element in revamping CCSM software for two reasons. First, many of the best-established, proven methods for managing complex software projects originated in industry and the military (e.g., the classic Code Complete (1993), the NASA Software Engineering Laboratory Series). An investment in adopting such standard development strategies, such as peer reviews, defect tracking, and regular system builds, is probably of more immediate benefit to the CCSM project than research into cutting-edge computer science, and in any case is likely to be a prerequisite for the successful integration of advanced research into the model.
Second, the technological boom of the 1990's has drawn much of the talent and innovation in computer science and software engineering into the commercial sector. The activities of industry have become increasingly relevant to efforts like the CCSM in a number of ways. Getting efficient performance on microprocessor-based systems is a commercially valuable skill. Teams who formerly worked on distributed scientific computing have moved wholesale to related problems in distributed web technologies. Linux clusters are viable high-performance platforms. Analogues of commercial tools such as CORBA are being emulated in a scientific context. The need for a flow of ideas and technologies up and down the Branscomb pyramid, which has at its base the multitude of desktop computers and at its tip the relatively few supercomputers, drives current initiatives, such as the Common Component Architecture project.
We note that the correspondence between military and commercial strategies is deliberate. The Report of the Defense Science Board Task Force on Acquiring Defense Software Commercially (1994) identified the following as the primary causes of trouble in DoD software projects:
Assuming that this would be a sensible strategy for the CCSM project, how could one go about implementing it without disrupting scientific productivity?
A significant opportunity for achieving greater alignment with the commercial sector is the formation of a focused software engineering team to develop infrastructure usable by the CCSM. Separating non-scientific code from research code allows a software engineering team to work on a relatively stable problem that can be solved in the manner of their choosing (as long as it meets scientific requirements). Therefore, such a team can be a testbed for new practices, tools, design strategies, and implementation languages. A team of this sort has been proposed for the ESMF effort. Since the infrastructure software is intended to serve a wide variety of projects, the software team does not need to be part of the CCSM group itself.
4.1 Software Management Plan
Currently the CCSM project has no software engineering manager. Coordination and management of software is accomplished partly through the efforts of the project's lead scientists and partly through the efforts and interaction of individual software developers. This approach has sustained the project so far, but it has very serious drawbacks. It requires scientists to manage software processes that are outside their interests and areas of training, in addition to their research workload. The competing demands on scientists' time sometimes means that software management is justifiably overlooked. The result is that software procedures are not well established or documented, that information related to software issues propagates erratically throughout the project, and that coordination of the multiple, geographically distributed development teams working on the CCSM is difficult. The newly formed CCSM Software Engineering Working Group (SEWG) is an important step towards increasing the visibility of software issues and helping to coordinate multiple development efforts, but it does not perform day-to-day project management. A day-to-day software manager or coordinator is needed.
Studies reflect what common sense suggests. In the book Patterns of Software Failure and Success (1996), Jones analyzed thousands of software projects in multiple domains (systems software, military software, commercial software, others) and came to the following conclusion:
"It is both interesting and significant that the first six out of sixteen technology factors associated with software disasters are specific failures in the domain of project management, and three of the other technology deficiencies can be indirectly assigned to poor management practices."
Even if the CCSM's non-scientific infrastructure code is developed by a focused, separately managed software team, the need for clearly identified software engineering management within the CCSM project remains.
Adequate software engineering management extends beyond simply hiring a software engineering manager. Complex software projects such as the CCSM may also require management support positions, such as a "gatekeeper" or "librarian" who oversees configuration management of the project. Duties may include monitoring check-ins, ensuring that the appropriate testing and validation occurs, setting up automated tests, and coordinating new releases. Support for such a position is requested in the ACPI Avant Garde proposal.
Software management milestones:
4.2.1 Atmosphere
The CSM-1 atmospheric model is currently undergoing major code rework as part of the ACPI Avante Garde initiative (see Section 5.1). Code restructuring will be performed incrementally with three phased developments. The first stage is a design study, remapping of data structures, and restructuring of the code for the calculation of physical parameterizations. Second, the dynamical core interface will be formalized and implemented. This work will make it possible for parallel implementations of different dynamical cores to proceed independently. The third phase will be the parallel development of three high-performance dynamical cores and integration with the atmospheric model. The dynamical cores will be Eulerian-Spectral, Semi-Lagrangian-Spectral, and Lin-Rood.
Other work on the atmospheric component includes short-term scientific improvements over the CSM-1 atmosphere, such as explicit treatment of the sulfate aerosols and cloud liquid water, as well as time-split physics and significant upgrades to the short and long-wave radiation codes. The vertical resolution will be increased from 18 to 26 levels and the capability for reduced horizontal grids has been added. The history output will be in netCDF format.
Column physics: flexible data decomposition
The flexible parallel decomposition of the physics parameterizations
is key to performance portability of the atmospheric model. Since
each vertical column of the atmosphere can be computed independently, there
is an opportunity to exploit natural parallelism. The more computationally
intensive the physical parameterizations included in the model, the more
this parallelism is likely to offset the communication costs that are associated
with data distribution.
The data structure currently used in physics calculations is an entire latitude slice with the end index variable to accommodate a reduced grid. This choice is excellent for performance on vector machines but yields poor performance on machines with modest caches. A rewrite will change the parameterization package, so that the data structure is an arbitrary collection of columns. In addition to making the code cache-friendlier, it increases its scalability. Decomposition specification will be in terms of both data distribution and shared memory multitasking, so that scientists do not need to multitask individual parameterizations. Since the decomposition allows column sets that are not spatially contiguous, static load balancing for radiation calculations will be supported; this is accomplished by grouping columns that lie opposite each other on great circles.
Swappable dynamics
This work will increase the modularity of the atmospheric model by
clearly encapsulating the dynamical cores. Incorporating this strategy
in the initial design is important for three reasons. First, by clearly
defining the interface in the atmospheric model, code optimizations and
parallel decomposition strategies for the dynamics can be developed independently
and optimally of other components (e.g., physical parameterizations).
Second, parallel constructs will be more isolated so that the model can
be readily ported and adapted to new platforms. Third, research into
new dynamical cores can be accelerated, since multiple groups can develop
codes simultaneously. The set of dynamical cores developed (Eulerian-Spectral,
Semi-Lagrangian-Spectral, and Lin-Rood) will utilize hybrid OpenMP/MPI
parallelism. Both the column physics rework and the swappable dynamics
capability will be implemented in the 18-month time frame of the ACPI proposal.
In the longer term, we plan to modify the interface of component models so that they can use the ESMF coupling mechanism. I/O and other utilities will be replaced as appropriate by those supplied by ESMF. We anticipate that the increased code modularity provided by the ACPI team will make it easier to adapt to the future framework, both at the levels of basic utilities and component coupling.
4.2.2 Ocean
The CCSM-2 ocean component uses the POP 1.4 code on a displaced Greenland pole grid. All the physics packages from the previous ocean component have been coded into POP, including an optimized version of the Gent-McWilliams (GM) parameterization of mesoscale eddy effects and the Large et al. k-profile parameterization vertical mixing scheme. An improved treatment of the equation of state has been added, as well as variable surface layer thickness options, a marginal-seas balancing scheme, and a new time manager. Additional options have been added to the GM eddy parameterization. The eddy advection and isopycnal diffusion parameters can take different values, and these parameters can vary spatially according to the specification of Visbeck et al. Development of partial bottom cells for complex topgraphy is entering the testing phase, and a bottom boundary transport scheme will also be tested in the near future. The model supports two resolutions: a coarse 3.6' x 2' x 25 level grid for paleoclimate applications and a higher resolution 1' x 1/2' x 40 level grid for current/future climate studies. This work is being shared with the POP developers at the Los Alamos National Laboratory. This work should be completed in the next year.
4.2.3 Land
The CCSM2-land model will contain the essential biogeophysics of the Common Land Model (CLM) with some modifications such as use of the LSM vegetation albedos, the burial of vegetation vertically rather than horizontally and the linking of photosynthesis and stomatal conductance. Biogeochemistry is not currently implemented. The near term goals are to incorporate treatments of biogeochemistry, dust emission, vegetation dynamics and carbon cycling. Vegetation dynamics implies that the land cover is no longer prescribed but rather predicted. This will involve significant changes to the data structures of the code.
4.2.4 Sea Ice
The CCSM2 Sea Ice Model development is basically complete based on goals set by the CCSM Polar Climate Working Group. Several improvements have been made to the Sea Ice Model (CSIM4) in the past year. Cleaning up the code was a high priority and has been completed. This included removing and combining redundant subroutines and modifications to use F90 modules. It was decided that Hunke's Elasto-Viscous Plastic (EVP) rheology and Bitz's sea-ice thickness and thermodynamics distribution would be included as part of the default physics. Lipscomb's linear remapping scheme has been added to better resolve the thin end of the ice thickness distribution (ITD). A prescribed ice model was added for atmospheric tuning. An ocean mixed layer model is currently being added to CSIM4. Further testing needs to be done to find remaining bugs and resolve coupling issues.
Future plans include the addition of metric terms to the EVP dynamics and improvements to the subgrid scale processes such as ice/ocean interaction, melt ponds and surface albedo. A new horizontal remapping scheme is being evaluated at LANL as a possible replacement for MPDATA; Lipscomb is currently testing the Dukowicz-Baumgardner scheme in CSIM4. Model sensitivity to atmospheric forcing, model physics and parameters needs to be examined. Code documentation will be updated and a User's Guide will be written.
4.2.5 Coupler
Coupler development will proceed on two tracks. The first, which has a shorter term focus, will deliver the CCSM coupler version 5. This involves a DOE/NCAR team optimizing the existing CCSM coupler (version 4) as needed to support the performance requirements for a 2001 release of a CCSM2.0 coupled system. The desired overall throughput is 5 years/day as a minimum with 10 years/day preferred. The second, longer term track, the DOE/NCAR team will design and implement a next generation coupler, CCSM coupler version 6. A version 6 coupler that functions sufficiently to replace the version 5 coupler will be available by the end of 2001. The DOE/NCAR team will collaborate with an Earth System Modeling Framework (ESMF) initiative as appropriate.
The version 6 CCSM coupler is intended to have the following properties:
4.2.6 Utilities
Communication
We plan to coordinate the development of communication libraries with
multiple development teams. Both the ACPI team and the ESMF collaboration
have proposed similar schemes. As with the coupler, the ACPI activity
can be used to develop early prototypes of the code desired for the future
ESMF framework.
I/O
The ACPI effort will investigate various ways to introduce parallel
I/O into the CCSM in the short term. A joint CCSM/Unidata proposal
(see Section 5.4), if funded, employs the CCSM as a testbed for a new,
parallel version of netCDF layered on HDF-5.
Higher level I/O functions will be provided by the ESMF framework. A current project is investigating the C-based coding strategy proposed for the ESMF utilities (see Section 5.3), and a set of basic utilities has been implemented and partly tested in the CCM and LSM codes.
Calendar
A calendar that supports a leap year option was implemented and tested
in the CCM and LSM codes. Its incorporation into the CCSM source
code is pending testing on additional platforms.
Software restructuring milestones
The CCSM has recently begun to increase the formality of its software development process. In the work undertaken by the DOE/NCAR team for the ACPI project, coding has been preceded by requirements definition and review, and several design documents are underway. There is a proposed set of coding standards for CCSM Fortran 90 development. However, the CCSM group lacks systematic, documented practices for many aspects of software development. For example, the project does not have a documented software design process, a unified strategy for code repositories for distributed development groups, a standard set of code testing and build procedures, or an overall defect-tracking system. The CCSM software process lacks many of the nine most effective practices identified by a large community of software experts as a result of the 1994 Software Best Practices Initiative, such as risk analysis, metric tracking for project planning, and project-wide visibility of a software plan (see Appendix C).
We can roughly categorize the CCSM software process by examining the Capability Maturity Model (CMM(tm)) for Software developed by the Software Engineering Institute at Carnegie-Mellon. The CMM judges the maturity of the software processes of an organization and identifies the key practices that are required to increase the maturity of these processes. Defined maturity levels are described in Appendix D. The lack of defined management and many formal procedures indicates that the CCSM process is still in the "initial" (ad hoc/chaotic) stage, though the recent introduction of peer reviews and coding standards suggests that it is acquiring characteristics of the second "repeatable" level.
Gains in productivity and software quality are associated with adopting more systematic processes, and the CCSM will continue to do so. However, it is essential that practices be carefully adapted for the working habits of scientists and the distinctive, research-driven NCAR environment. There are limits to how much code formality it is practical to adopt. For general, non-scientific utilities developed by software engineers, it makes sense to introduce rigor into the specification, design, and testing process. This helps to ensure that software engineers deliver robust software that meets requirements. Efficiency can be encouraged by following practices, such as producing design documentation that can be readily converted to user documentation and allowing portions of this documentation to be automatically generated from source code.
Coding scientific parameterizations is a different matter. It is probably unreasonable to demand requirements definition and reviews for this type of code; it changes rapidly and scientists should not be encumbered by the software engineering process. However, it is reasonable that scientific code be checked by software engineers for appropriate error handling, code format, etc., and, if appropriate, optimized for performance before being integrated into the CCSM production code.
In the short term, we plan to focus the effort to systematize software development by using the CCSM Software Engineering Working Group (CCSM SEWG) to foster discussion and agreement on accepted practices. We plan to document procedural conventions in a CCSM Software Developer's Guide, and work is already underway. We have posted sample contents for comment on the SEWG website (http://www.cesm.ucar.edu/working_groups/Software), and we are in the process of collecting references and information on current practices. Sources for software practices include the classic Code Complete (1993) and the Manager's Handbook for Software Development and other documents from the NASA Software Engineering Laboratory.
Once a CCSM software engineering manager position is established, procedures can be reassessed. The manager can help to encourage adherence to guidelines.
Staff support for new practices and techniques will be encouraged by training. Staff development opportunities should be increased.
Software practices milestones:
4.4.1 Technical support
Since the CCSM is a community model, it is essential that the project
identify a designated primary contact for software engineering technical
support and a trouble-ticket mechanism to track reported problems.
Once this contact is established, the CCSM documentation and website must
clearly indicate how to report technical problems.
4.4.2 User's Manual
There is currently minimal documentation on how to run the overall
CCSM. We propose the creation of a User's Guide that explains the
installation procedure and summarizes options for configuring the model.
The team preparing the User's Guide should be a mix of junior and senior
software engineers and support personnel, so that the time senior software
engineers spend on the effort is minimized.
4.4.3 Graphical User Interface
As options for configuring the CCSM increase along with scientific
capabilities, the initialization procedure for the model will grow more
complex. A web-based Graphical User Interface (GUI) can help users
understand and select options for running the model and track model progress
while it is running. It could eventually be used to interactively
steer model runs. Any GUI developed must be highly extensible, so
that it does not need to be substantially modified when changes are made
to CCSM code.
A pilot project to develop such an interface for the CCSM would be of enormous value to its community of users. It is probably not appropriate to focus development resources on such a pilot project until some of the major current code restructuring activities are completed. We note that basic GUI technology for technical applications is well-established in such areas as real-time systems and that expertise already exists at NCAR in such groups as the Atmospheric Technology Division (ATD).
User support milestones:
The following is a tentative timetable for the milestones listed in
this section. Other major events related to CCSM software development
are indicated in boldface.
| Date |
Milestone |
| Fall/Winter 2000 |
|
| Spring 2001 |
|
| Summer 2001 |
|
| Fall/Winter 2001 |
|
| Summer 2003 |
|
5 CCSM-Related Software Projects and Initiatives
A summary of initiatives and collaborating institutions is provided in Appendix A.
5.1 DOE "ACPI Avant Garde" Project
The Accelerated Climate Prediction Initiative (ACPI) Avant Garde project is a joint Department of Energy/National Science Foundation project. The goals of the project are: 1) where readily possible, improve performance of the existing CCSM and 2) work with the CCSM development team to design and build a more performance-portable, modular CCSM. This 18-month project commenced June 1, 2000.
The ACPI work focuses on two components of the CCSM that offer ample opportunities for performance optimization: the flux coupler and the atmospheric component. In the atmospheric component, work will focus on improving on-node performance, developing a more modular structure that permits the exchange of alternative dynamics and physics components, and developing high-performance communication libraries. Work on the coupler will address issues of on-node performance, scalability, and configurability. The DOE/NCAR team also will work on improving the ocean component and support for high-performance, parallel I/O.
In addition to code restructuring, the ACPI project emphasizes a structured approach to code development. The five basic elements of the ACPI strategy are:
The third phase of the NASA High Performance Computing and Communications (HPCC) program focuses on high-performance frameworks. A Cooperative Agreement Notice (CAN) has not yet been released, but a draft version has been available since December, 1999. The CAN is expected to call for the creation of an Earth System Modeling Framework (ESMF) by the Earth science community. NCAR has been designated the lead institution, and the CCSM will be a primary testbed for the new framework. Several preliminary meetings, the most recent on August 7-9, 2000 at NCAR, have established the collaborating institutions, a set of application requirements, and a preliminary architecture.
The ESMF will offer at minimum the following functionality: gridded data decomposition and load balancing, communication including synchronization, I/O, and grid transforms. These capabilities can be divided into two main functional areas.
5.3 I/O Library
Currently, all CCSM components, including the flux coupler, implement different I/O handlers. Much of the code is redundant and could be consolidated into a single library. A collaboration between NCAR SCD/CSS and the CCSM project began in winter, 1999 with the objective of developing an I/O library that all CCSM components could share. The longer-term objective of the library is the creation of a very general purpose I/O library usable by multiple models.
Requirements were completed in March, 2000 and a preliminary design specification released in July, 2000. Development is proceeding from low-level utilities up to higher-level functions, such as field abstractions and time averaging. The library is being implemented in object-based C and is utilizing the infrastructure from the PETSc library developed at Argonne National Laboratory. The PETSc infrastructure offers signal and error handling, C to f90 pointer conversion, C/f90 character string conversion, and other useful features. A set of basic classes that represent data sets, files, storage, and case description were built using these tools and integrated into test versions of the CCM and LSM codes.
A calendar manager was written as an early prototype of this coding technique, and it has also been integrated into the test version of CCM and LSM codes. Integration into the CCSM source is pending validation on additional platforms (the code to date has only been validated on the IBM).
This I/O library effort will be a prototype for the I/O and utilities in the Earth System Modeling Framework. The continued development of the I/O library will be strongly influenced by the preliminary design of the ESMF.
5.4 Parallel NetCDF
A joint Unidata, NCSA, NCAR/SCD, NCAR/MMM, and NCAR/CGD proposal entitled Merging the NetCDF and HDF-5 Libraries to Achieve Gains in Performance and Functionality was submitted to the NSF ITR program in February, 2000. The goal of the project is to develop a high-performance version of netCDF that is based on the HDF-5 library. The new version would have a parallel write capability and features for packing and extending a data set in multiple dimensions. The CCSM is a proposed testbed, and, if the project is funded, staff will be provided to help integrate parallel netCDF into the CCSM source code.
6 Summary
In this plan we have identified several shifts in perspective necessary to achieve the software goals implied by the CCSM Plan.
These are:
Summary of CCSM-Related Projects and Collaborations
| Project/Initiative | Description | Collaborators | Status |
| ACPI Avant Garde Project | 18-month project focused on CCSM coupler and atmosphere model with some development of parallel I/O, ocean model optimization and communication |
|
Began June 1, 2000 |
| Earth System Modeling Framework (ESMF)/NASA HPCC Cooperative Agreement Notice (CAN) | 3-year project will focus on a flexible, general coupling strategy and the development of a utility tool kit for modeling |
|
Cooperative Agreement Notice release is expected mid-September, 2000. Initial draft of ESMF design document is scheduled for late September, 2000 |
| I/O Library | Ongoing project that is currently prototyping implementation of an I/O handler similar to that proposed for the ESMF |
NCAR/SCD, NCAR/CGD |
Work will merge with ESMF project |
Summary of Proposed CCSM Software Engineering Documents
| Document | Description | Status |
| CCSM Software Engineering Plan | Outlines a technical strategy for achieving the goals of the CCSM Science Plan, describes the coordination of multiple development activities | Draft available |
| CCSM Developer's Guide | Describes the conventions associated with specifying, designing, implementing, testing, and maintaining code for the CCSM | In progress; sample contents proposed and information being collected |
| CCSM Users's Guide | Describes how to install and run the CCSM and summarizes usage options | Newly proposed |
| Requirements and Design Documents | Specifies in detail the requirements, interfaces, and functions of individual CCSM components | Several requirements documents completed and reviewed |
The Software Best Practices Initiative
"The Software Best Practices Initiative (1994) represents the collective efforts of nearly 200 development and maintenance expert practitioners from the commercial and government world, industry leaders, software visionaries and methodologists ... Seven panels studied successful software programs in the public and private sectors to determine those practices characteristic to all programs and significant leverage items for success." (Air Force Guidelines for Successful Acquisition and Management of Software Intensive Systems, June, 1996)
The 9 "best practices" are:
Formal risk management
Risk management involves continuously updating and monitoring risk
plans to account for new, potential and manifest risks.
Agreement on interfaces
To address the chronic problem of vague, inaccurate, and untestable
specifications, a baseline user interface should be agreed upon across
affected areas before beginning implementation activities.
Metric-based scheduling and tracking
Statistical quality control of costs and schedules should be maintained.
Defect tracking
Defects should be tracked formally during each project phase.
Project-wide visibility of project plan and progress versus plan
The core indicators of project health or dysfunction should be made
available to all project participants.
Configuration management
Considered essential to any software development project.
Inspections, reviews, and walkthroughs
Peer reviews should be conducted at all design levels (particularly
detailed designs), on code prior to unit test, and on test plans.
Quality gates
Completion events should be in the form of "gates" that assess the
quality of the product produced or the adequacy and completeness of the
finished process.
People-aware management
Management must be accountable for staffing qualified people, as well
as for fostering an environment conducive to low staff turnover.
(This has since been updated to 16 best practices, but these are the basics.)
Capability Maturity Model (CMM) for Software
(See http://www.sei.cmu.edu/cmm/)
Level 1 - Initial. The software process is characterized as ad hoc, and occasionally even chaotic. Few processes are defined and success depends on individual effort and heroics. Key challenges: project management, project planning, configuration management, software quality assurance.