The following sections describe in greater detail improved processes and tools that, if used in a coordinated way, will improve CCSM software quality and control. In particular, implementing a new CVS access policy, creating change control boards, increasing testing significantly, and maintaining project status are all a part of the plan.
To a large degree, these policies are general and should be applied to all CCSM components as well as to the CCSM model as a whole. It is recognized that individual components have different needs and concerns, and that slight variations in policy may be required for each component. In addition, there may be distinctions required between scientific development and software engineering development. However, there shouldn't be any distinctions between internal and external developers.
There is recognition that CCSM is fundamentally a research project and that individuality is an important part of the way people work. There also must be recognition that CCSM is a project that has a mandate to release a software product to the community that is stable and of high quality. This proposal attempts to balance the needs of the individual developer with the needs to the overall community. Hopefully the developer will not feel significant changes in the way they work, but the release product quality will be higher. Many of the new processes implement additional controls between what is considered development code versus what is considered production code.
Ultimately, the goal is to be able to do ``better science''. One part of that goal can be achieved by improving the software quality of CCSM and making people in the project more productive. The recommendations in this document strive toward those goals.
To date, the CCSM project has relied almost entirely on informal software configuration management processes to coordinate CCSM code modification. Our current tool, CVS, deliberately puts very few restrictions on the developers ability to change code, allowing for wide-ranging modifications to be easily applied into virtually any element of the code. This capability for rapid development is of great utility during the development and testing stage of the software process. However, this freedom comes at a high cost. Without formal reviews or impact assessment of changes, hidden bugs and unintended component interactions can be introduced into the code with equal rapidity and ease.
As the code matures or release dates near, the stability of the system becomes an increasingly high priority. Experience from the software industry has shown that when code quality and stability, rather than rapid code change, is desired, it is wise to incorporate a more formal review process. Under formal review, modifications to production level products (code or documents) are made only with prior approval. Upon completion, all modifications undergo evaluation for correctness and impact on the rest of the system. Ad-hoc code modification is no longer permitted in release code.
In conjunction with a better defined policy for repository access (see below), separate change review boards (CRBs) should be defined for both CCSM and CCM. The other components can incorporate this process as they see fit. The CRB is responsible for approving all configuration changes to baselined code (code on the main trunk). This ensures that all proposed changes receive a technical analysis and review and that they are documented for tracking and auditing purposes. The board also has final responsibility for release management. Many elements of the change review board already exist in CCSM. The goal must be to officially establish the board to gain formal control over the change approval process as it affects configuration control. The basic task of the CRB is to declare baselines and then review changes and approve, disapprove, or defer their implementation. The above is an extremely important task. The CRB must have control of the project. Nothing should be changed without their approval or knowledge. For this reason, board members must be chosen carefully.
Requests to the CRBs would be via a web form. Requests would take several forms. Developers would add and update requests to document current efforts and status. They would then make a separate request to merge their development work into the production software. Only requests to merge work into production versions would be reviewed by the CRBs. The general status requests are used to improve visibility of development efforts.
In many ways, the individual working groups act as change review boards for their components and the CCSM scientists meeting serves as a surrogate to a CCSM change review board. However, the process should be better formalized within both CCSM and CCM. For CCM, the change review board should include a small number of scientists and software engineers from the climate modeling section and/or atmospheric model working group. All CCM change requests and the status of changes would go through this change board. All development work would take place on CCM branches and the change board would meet regularly (about 1x/month) to approve branch merges onto the main trunk. A gate keeper would merge these changes, carry out testing, and then tag the version. The merge onto the main trunk would also be an opportune time for a code reivew. This would solve a number of concerns indicated in Appendix A including important issues like understanding who is working on what, an increased ability to document the model status, prioritize efforts, coordinate the repository, test appropriately and maintain quality. In CCSM, the CRB should include at least Jeff Kiehl and Tony Craig plus some additional representation from the science and software engineering areas. The CCSM CRB would meet to plan, prioritize, and schedule changes to CCSM. These can be both for short term and long term changes. The CRBs act as a formal review board. Input from a wide range or perspectives is required to make decisions. The mandate is to review change requests based on input from many.
A fast-track review process will be put in place to correct model bugs. This will only be used to quickly modify a recent production version and only for fixes that do not require rigorous reviews or testing.
There are a number of issues surrounding the CVS repository and access. CCSM has a large and growing developer base and there are concerns over privacy, testing, and control. Sharing code via tar files is impractical because of the code divergence and merge costs. Granting open access to the repository is dangerous. There are limited resources available for exploring or purchasing new tools.
CCSM should continue to use CVS as a tool for source code control. CVS is free and CCSM developers are finally becoming experienced and comfortable with it. Evaluating, choosing and spinning up a new source control tool would require a committment of at least one person full time for several months for tasks like evaluation, testing, and training. The risks and cost are significant in terms of potential resources required, potential financial cost, and/or potential loss of productivity during the transition period. In the background, CCSM should look into the availability of other tools in conjuction with outside collaborators. There should be an ongoing community effort to evalute new (free and commericial) software tools, so if a new tool becomes available and recommended, CCSM could migrate to it.
The primary problem with CVS is its inability to read and write protect parts of the repository. In the CCSM repository, access can be limited to particular components, but access within a component cannot be formally controlled. Access control within a component can be solved by implementing a formal CVS policy in CCSM that grants access with certain constraints. Users would be granted permission to read and/or write only on specific branches and/or over specific time frames, usage would be monitored via the CVS logging capability, and user access would be terminated upon violation of the terms of the policy. The scope of access would be determined on an individual basis by the appropriate CCSM component working group. The policy would also outline rules on things like naming conventions and testing requirements. In addition, individual components should consider implementing a policy where development is allowed only on branches and that a gate keeper is in place to manage the main trunk under direction of a change review board. This allows for continued rapid development while maintaining control of the production code.
Another option for controlling CVS access would be to have separate, distinct CVS repositories for individuals who are concerned about privacy. CVS import and export would then be used to bring source code into and out of the CCSM repository. This option needs to be explored further, and if feasible, could be used to protect the most sensitive pieces of source code.
An individual should be tasked with developing, running, and maintaining testing software for CCSM on a full-time basis. The CCSM test engineer would be responsible for weekly tests of released software on all validated machines, daily build and smoke tests of software under development, and regression testing of source code changes. In addition, the CCSM test engineer would provide test scripts for developers and assist in developing acceptance and unit tests. It is likely the CCSM test engineer would be a new hire in the group.
A substantial increase in personnel resources and time must be spent on testing. At the current time, changes go through minimal or no testing before climate simulations are started. This is very dangerous. Significant increases in quality and decreased risk will accompany increased testing. For comparison, typical software shops have about one test engineer for every two developers. CCSM falls far short of that ideal. Given the number of scientists, software engineers, and collaborators that carry out development, settting aside one full time person to be a test engineer and asking that all developers spend more time testing is a minimal requirement for success. Increased testing will increase the turn-around time of source code changes, but it should also increase quality and overall productivity.
Status accounting is the ability to both understand what changes have been implemented, what changes are proposed to be implemented, what defects have been discovered, who is working on these, and what the status is on each. A software tool to document the CCSM status can also be used for bug reports and to keep track of testing status. There are several free, publicly available tools for status accounting and tracking, including GNATS and Jitterbug. Someone should be given responsibility and time to explore and implement one of these tools for use in CCSM and that this be made relatively high priority. The accounting tool would be an integral part of the change review boards and vice versa.
Documents are important for good communication as well as project tracking. Documents include things like task lists, meeting notes and web pages. There are several documents that need to be developed or improved. Some, like the CVS policy statement, planning documents, and a status accounting document have been discussed above. The CCSM developers guide also needs to be kept better up to date and needs to be more complete. This document serves as the guide for all CCSM development! Most of what is in the document you're reading now, if it's adopted, belongs in the CCSM developers guide. An individual should be put in charge of coordinating documents including the developers guide to assure they remain up to date. It is also recommended that a review of the CCSM public and local web pages be undertaken in an attempt to clarify and improve document visibility.
Because of the size and importance of CCSM, CCSM development needs to become better focused on stability and quality than it is now. Better planning is required not only for CCSM science and software engineering, but for infrastructure support as well. CCSM needs to improve short term, long term, and strategic planning, and a process should be put in place to address this task. Part of the planning process on all time scales should include improved cost/risk/benefit analysis to set priorities to make best use of people's time.
Currently, short term plans span about a week of time and within that time frame the plan changes continuously. A plan is often put in place on Wednesday which includes changes to several components and a requirement that the system be running production by Friday. In addition, the plan may change on Thursday and Friday but the requirement to run on Friday does not change. This leads to minimal testing, little quality control on source code, and no time to document the changes that are contained in a particular version.
Model changes should no longer be incorporated into production versions unless they have been thoroughly validated and are shown to improve model simulation, and all change requests should go through a change review board. The risk associated with incorporating incompletely validated code is very high. The project risks releasing low quality code and carrying out suites of simulations with physics and software that are not fully understood. Most other climate modeling efforts take a much more conservative approach to incorporating new code than CCSM.
In terms of longer term CCSM planning, what typically happens now is that a model release date is stated months in advance with little thought given to what the model will contain or how the development will advance given the schedule. Under CCSM2 development, a December, 2000 release date was announced 6 months in advance and then there was a mad scramble to try to meet that date. In the end, the model will probably be released a year late (18 months rather than 6 months). During this 18 month development and release process, there was little planning. Each week was a new frenzied attempt to get the model frozen with no broad understanding of the requirements.
Some of the planning issues will be solved if change review boards are created. However, each component and the CCSM as a whole should regularly outline some long term goals; what individual models are going to accomplish in time; when code is going to be frozen; how much time is going to be set aside for testing, porting, validation, tuning, and documentation; and when new code is going to be released. The goals need to be realistic and the planning documents should include schedules, milestone targets, and priorities. In addition, longer term planning to prioritize issues like model performance, base code rewrites, infrastructure support, and vectorization optimization is required.
CCSM software engineering strategic planning does occur now to some degree. There is a ``CCSM Software Engineering Plan 2000-2005'' that was prepared largely by the software engineering working group that addresses many strategic planning issues. This document needs to be reviewed and revised regularly. In addition, the role that a computer scientist could play in the overall strategic planning should be investigated.
In order to improve our ability to plan changes and estimate schedules, regular reviews of progress should occur. Does CCSM meet short term goals on schedule? Are long term goals met? How poor are the scheduling estimates, and are there particular issues that come up regularly that cause delays? Reviews should become a standard part of CCSM software engineering process. In December, 2001 the past 18 months of CCSM2 development should be reviewed to understand what went well and what went poorly to improve future expectations, process, and scheduling.
Many of these recommendations require a stronger management structure. There needs to be agreement within the community that these policies will be used and enforced. The CCSM scientific coordinator and the SSC need to be on-board, and they need to make sure processes and policy are understood. There also needs to be a oversight process where individuals can be disciplined if they violate policy. The goal is to better control quality and improve productivity, while still allowing individuals to carry out independent, creative scientific and software development.
There are a number of other issues that are worth bringing up as a part of this discusssion.
A code review policy should be implemented. All component code changes should be reviewed before merging onto the main trunk and/or before release to CCSM. Another final review is probably required of CCSM code that is to be formally released to the public.
Model output control and access needs to be better defined. A policy should be put in place with regards to data storage, naming conventions, data locality, data compression, and access. A separate data management group should be formed to deal with this issue immediately, and it should include members from the CCSM community, especially the applications groups Climate Change and Paleoclimate. Along the same lines, standard analysis plots need to be better defined and available, and questions regarding analysis and plotting tools needs to be addressed.