Using mksurfdata to create surface datasets from grid datasets

mksurfdata is used to create surface-datasets from grid datasets and raw datafiles at half-degree resolution to produce files that describe the surface characteristics needed by CLM (fraction of grid cell covered by different land-unit types, and fraction for different vegetation types, as well as things like soil color, and soil texture, etc.). To run mksurfdata you can either use the mksurfdata.pl which will create namelists for you using the build-namelist XML database, or you can run it by hand using a namelist that you provide (possibly modeled after one of the examples that is provided in the models/lnd/clm/tools/mksurfdata directory. In the next section we describe how to use the mksurfdata.pl script and the following section gives more details on running mksurfdata by hand and the various namelist input variables to it.

Running mksurfdata.pl

The script mksurfdata.pl can be used to run the mksurfdata program for several resolutions, simulation-years and simulation year ranges. It will create the needed namelists for you and move the files over to your inputdata directory location (and create a list of the files created, and for developers this file is also a script to import the files into the svn inputdata repository. It will also use the build-namelist XML database to determine the correct input files to use. And in the case of urban single-point datasets (where surface datasets are actually input into mksurfdata) it will do the additional processing required so that the output dataset can be used once again by mksurfdata. Because, it figures out namelist and input files for you, it is recommended that you use this script for creation of standard surface datasets. If you need to create surface datasets for customized cases, you will be better off running mksurfdata on it's own. For help on mksurfdata.pl you can use the "-help" option as below:


> cd models/lnd/clm/tools/mksurfdata
> mksurdata.pl -help
The output of the above command is:

SYNOPSIS
     mksurfdata.pl [options]
OPTIONS
     -dinlc [or -l]                Enter the directory location for inputdata 
                                   (default /fs/cgd/csm/inputdata)
     -debug [or -d]                Don't actually run -- just print out what 
                                   would happen if ran.
     -years [or -y]                Simulation year(s) to run over (by default 1850,2000) 
                                   (can also be a simulation year range: i.e. 1850-2000)
     -help  [or -h]                Display this help.
     -res   [or -r] "resolution"   Resolution(s) to use for files (by default all ).
     -rcp   [or -c] "rep-con-path" Representative concentration pathway(s) to use for 
                                   future scenarios 
                                   (by default -999.9, where -999.9 means historical ).

NOTE: years, res, and rcp can be comma delimited lists.


To run the script with optimized mksurfdata for a 4x5 degree grid for 1850 conditions, on bluefire you would do the following:

Example 2-6. Example of running mksurfdata.pl to create a 4x5 resolution fsurdat for a 1850 simulation year


> cd models/lnd/clm/tools/mksurfdata
> gmake
> mksurfdata.pl -y 1850 -r 4x5

Running mksurfdata by Hand

In the above section we show how to run mksurfdata through the mksurfdata.pl using input datasets that are in the build-namelist XML database. When you are running with input datasets that are NOT available in the XML database you either need to add them as outlined in Chapter 3, or you need to run mksurfdata by hand, as we will outline here.

Preparing your mksurfdata namelist

When running mksurfdata by hand you will need to prepare your own input namelist. There are sample namelists that are setup for running on the NCAR machine bluefire. You will need to change the filepaths to run on a different machine. The list of sample namelists include

mksurfdata.namelist -- standard sample namelist.
mksurfdata.pftdyn -- sample namelist to build transient PFT land-use and land cover change over 1850 to 2005.
mksurfdata.regional -- sample namelist to build for a regional grid dataset (5x5_amazon)
mksurfdata.singlept -- sample namelist to build for a single point grid dataset (1x1_brazil)

Note, that one of the inputs mksrf_fdynuse is a filename that includes the filepaths to other files. The filepaths in this file will have to be changed as well. You also need to make sure that the line lengths remain the same as the read is a formatted read, so the placement of the year in the file, must remain the same, even with the new filenames.

We list the namelist items below. Most of the namelist items are filepaths to give to the input half degree resolution datasets that you will use to scale from to the resolution of your grid dataset. You must first specify the input grid dataset for the resolution to output for:

  1. mksrf_fgrid Grid dataset

Then you must specify settings for input high resolution datafiles

  1. mksrf_ffrac land fraction and land mask dataset

  2. mksrf_fglacier Glacier dataset

  3. mksrf_flai Leaf Area Index dataset

  4. mksrf_flanwat Land water dataset

  5. mksrf_forganic Organic soil carbon dataset

  6. mksrf_fmax Max fractional saturated area dataset

  7. mksrf_fsoicol Soil color dataset

  8. mksrf_fsoitex Soil texture dataset

  9. mksrf_furban Urban dataset

  10. mksrf_fvegtyp PFT vegetation type dataset

  11. mksrf_fvocef Volatile Organic Compound Emission Factor dataset

And optionally you can specify settings for:

  1. all_urban If entire area is urban (typically used for single-point urban datasets, that you want to be exclusively urban)

  2. mksrf_fdynuse "dynamic land use" for transient land-use/land-cover changes. This is an ASCII text file that lists the filepaths to files for each year and then the year it represents (note: you MUST change the filepaths inside the file when running on a machine NOT at NCAR). Normally we always use this file, even for creating datasets of a fixed year.

  3. mksrf_firrig Irrigation dataset (experimental mode, normally NOT used)

  4. mksrf_ftopo Topography dataset (this is used to limit the extent of urban regions and is used for glacier multiple elevation classes -- normally always used)

  5. mksrf_gridnm Name of output grid resolution (if not set the files will be named according to the number of longitudes by latitudes)

  6. mksrf_gridtype Type of grid (default is 'global')

  7. outnc_large_files If output should be in NetCDF large file format

  8. outnc_double If output should be in double precision (normally we turn this on)

After creating your namelist, when running on a non NCAR you will need to get the files from the inputdata repository. In order to retrieve the files needed for mksurfdata you can do the following on your namelist to get the files from the inputdata repository, using the check_input_data script which also allows you to export data to your local disk.

Example 2-7. Getting the raw datasets for mksurfdata to your local machine using the check_input_data script


> cd models/lnd/clm/tools/mksurfdata
# First remove any quotes and copy into a filename that can be read by the
# check_input_data script
> sed "s/'//g" namelist > clm.input_data_list
# Run the script with -export and give the location of your inputdata with $CSMDATA
> ../../../../../scripts/ccsm_utils/Tools/check_input_data -datalistdir . \
-inputdata $CSMDATA -check -export

There is one option to mksurfdata that is purely experimental, and should NOT be used (changes are NOT implemented in CLM4.0 to use it yet). This option is: mksrf_firrig. It separates out crop land-units into irrigated and non-irrigated columns. This option is experimental and not even enabled for use with CLM4.0.

Standard Practices when using mksurfdata

In this section we give the recommendations

If you look at the standard surface datasets that we have created and provided for use, there are three practices that we have consistently done in each (you also see these in the sample namelists and in the mksurfdata.pl script). The first is that we always output data in double precision (hence outnc_double is set to .true.. The next is that we always use the procedure for creating transient datasets (using mksrf_fdynuse) even when creating datasets for a fixed simulation year. This is to ensure that the fixed year datasets will be consistent with the transient datasets. When this is done a "surfdata.pftdyn" dataset will be created -- but will NOT be used in clm. If you look at the sample namelist mksurfdata.namelist you note that it sets mksrf_fdynuse to the file pftdyn_hist_simyr2000.txt, where the single file entered is the same PFT file used in the rest of the namelist (as mksrf_fvegtyp). The last practice that we always do is to always set mksrf_ftopo, even though multiple glacier elevation classes are NOT a part of CLM4. This is important in limiting urban areas based on topographic height, and hence is important to use all the time. In future versions of CLM the glacier multiple elevation classes will be used as well.

There are two other important practices for creating urban single point datasets. The first is that you often will want to set all_urban to .true. so that the dataset will have 100% of the gridcell output as urban rather than some mix of: urban, vegetation types, and other landunits. The next practice is that most of our specialized urban datasets have custom values for the urban parameters, hence we do NOT want to use the global urban dataset to get urban parameters -- we use a previous version of the surface dataset for the urban parameters. However, in order to do this, we need to append onto the previous surface dataset the grid and land mask/land fraction information from the grid and fraction datasets. This is done in mksurfdata.pl using the NCO program ncks. An example of doing this for the Mexicocity, Mexico urban surface dataset is as follows:


> ncks -A $CSMDATA/lnd/clm2/griddata/griddata_1x1pt_mexicocityMEX_c090715.nc \
$CSMDATA/lnd/clm2/surfdata/surfdata_1x1_mexicocityMEX_simyr2000_c100407.nc
> ncks -A $CSMDATA/lnd/clm2/griddata/fracdata_1x1pt_mexicocityMEX_navy_c090715.nc \
$CSMDATA/lnd/clm2/surfdata/surfdata_1x1_mexicocityMEX_simyr2000_c100407.nc
Note, if you look at the current single point urban surface datasets you will note that the above has already been done.

The final issue is how to build mksurfdata. When NOT optimized mksurfdata is very slow, and can take many hours to days to even run for medium resolutions such as one or two degree. So usually you will want to run it optimized. Possibly you also want to use shared memory parallelism using OpenMP with the SMP option. The problem with running optimized is that answers will be different when running optimized versus non-optimized for most compilers. So if you want answers to be the same as a previous surface dataset, you will need to run it on the same platform and optimization level. Likewise, running with or without OpenMP may also change answers (for most compilers it will NOT, however it does for the IBM compiler). However, answers should be the same regardless of the number of threads used when OpenMP is enabled. Note, that the output surface datasets will have attributes that describe whether the file was written out optimized or not, with threading or not and the number of threads used, to enable the user to more easily try to match datasets created previously. For more information on the different compiler options for the CLM4 tools see the Section called Common environment variables and options used in building the FORTRAN tools.