NCAR's Experience Porting and Running CESM2 on a Medium-sized Linux Cluster

NCAR typically runs CESM fully coupled finite volume dynamical core on large super-computers using 4096 cores on yellowstone and 2160 cores on cheyenne. However, we also port, run and regularly tested CESM on a more moderately-sized Linux cluster.

NCAR's Climate and Global Dynamics (CGD) division maintains a medium-size Linux cluster called hobart to support research and development.

This page details our experiences on hobart that might help other institutions port and run CESM2 on their Linux clusters.

* NOTE * This is for information purposes only. Please use the DiscussCESM forums to post your questions regarding porting and running on your particular Linux cluster.

Linux Cluster Hardware Specifications

Single login node with the following specifications:
Hostname : hobart
Operating System : CentOS Linux release 7.2.1511 (Core) x86_64
Kernal : 3.10.0-327.el7.x86_64
Processor(s) : 16 X Intel(R) Xeon(R) CPU W5580 @ 3.20GHz
CPU MHz : 3192.072
Total Memory : 74.05 GB
Total Swap : 1.04 GB

32 compute nodes with the following specifications for each node:
Operating System : CentOS Linux release 7.2.1511 (Core) x86_64
Kernal : 3.10.0-327.el7.x86_64
Processor(s) : 48 X Intel(R) Xeon(R) CPU ES-2670 v3 @ 2.30GHz
CPU MHz : 23000.000
Total Memory : 98.59 GB
Total Swap : 1.04 GB

Available shared disk space for run and build directories :
5.0 T

inter-connect network fabric :
QLogic InfiniBand, QDR with PSM

Linux Cluster Software Specifications

CESM2 release code

Python (2.7 or greater) but not Python 3

Queueing system : PBS

Fortran compiler 2003 :
Test Case #1 used Intel 15.0.2.164
Test Case #2 used GNU 5.4.0

MPI library
Test Case #1 used mvapich v2.2.1 compiled with Intel and Qlogic libraries
Test Case #2 used openmpi v2.0.2 compiled with gcc

NetCDF4 library
Test Case #1 used NetCDF v4.3.2 compiled with Intel
and parallel-NetCDF v1.7.0 with Intel and mvapich v2.2.1 library
Test Case #2 used NetCDF v4.4.1.1 compiled with gcc-5.4.0

Test Case #1 Description

Fully-coupled 1850 simulation at 1 degree resolution using 32 nodes (1536 cores) and 16 nodes (768 cores)
create_newcase --case b.e20.B1850.f09_g17.01.intel --res f09_g17 --compset B1850 --compiler intel --mpilib mvapich2

Test Case #2 Description

Fully-coupled 1850 simulation at 1 degree resolution using 16 nodes (768 cores)
create_newcase --case b.e20.B1850.f09_g17.01.gnu --res f09_g17 --compset B1850 --compiler gnu --mpilib openmpi

Fully-coupled Run Perfomance

The CESM2.0 timing table includes the timing and balanced model decomposition across MPI processors for the test cases listed above. Your performance may vary depending on your cluster configuration.

Porting and Testing

This NCAR medium-sized Linux cluster is included in routine regression testing as part of CESM model development. Please refer to the CIME User's Guide for details regarding porting and testing on a new machine.