Using mksurfdata_map to create surface datasets from grid datasets

mksurfdata_map is used to create surface-datasets from grid datasets and raw datafiles at half-degree resolution to produce files that describe the surface characteristics needed by CLM (fraction of grid cell covered by different land-unit types, and fraction for different vegetation types, as well as things like soil color, and soil texture, etc.). To run mksurfdata_map you can either use the mksurfdata.pl script which will create namelists for you using the build-namelist XML database, or you can run it by hand using a namelist that you provide (possibly modeled after an example provided in the models/lnd/clm/tools/clm4_5/mksurfdata_map directory). Note there is a version of mksurfdata_map for clm4_0 and one for clm4_5 under the relevant directory. The namelist for both versions of mksurfdata_map is sufficiently complex that we recommend using the mksurfdata.pl tools to build them. It also requires that mapping files from your output grid to the raw datasets that mksurfdata_map are made to regrid relevant datasets (see the Section called Creating mapping files that mksurfdata_map will use and Figure 2-1 for a visual representation of the process). When you run mkmapdata.sh make sure you specify if you are running CLM4.0 or CLM4.5! For standard resolutions these mapping files are already created, but if you want to run for your own single-point or region, you'll need to create these mapping files. In the next section we describe how to use the mksurfdata.pl script and the following section gives more details on running mksurfdata_map by hand and the various namelist input variables to it.

Running mksurfdata.pl

The script mksurfdata.pl can be used to run the mksurfdata_map program for several configurations, resolutions, simulation-years and simulation year ranges. It will create the needed namelists for you and move the files over to your inputdata directory location (and create a list of the files created, and for developers this file is also a script to import the files into the svn inputdata repository). It will also use the build-namelist XML database to determine the correct input files to use, and for transient cases it will create the appropriate mksrf_fdynuse file with the list of files for each year needed for this case. And in the case of urban single-point datasets (where surface datasets are actually input into mksurfdata_map) it will do the additional processing required so that the output dataset can be used once again by mksurfdata_map. Because, it figures out namelist and input files for you, it is recommended that you use this script for creation of standard surface datasets. If you need to create surface datasets for customized cases, you might need to run mksurfdata_map on it's own. But you could use mksurfdata.plwith the "-debug" option to give you a namelist to start from. The list of files needed is very long and not necessarily easy to figure out. For help on mksurfdata.pl you can use the "-help" option as below:


> cd models/lnd/clm/tools/mksurfdata_map
> ./mksurdata.pl -help
The output of the above command is:

SYNOPSIS 

     For supported resolutions:
     mksurfdata.pl -res <res>  [OPTIONS]
        -res [or -r] is the supported resolution(s) to use for files (by default all ).

      
     For unsupported, user-specified resolutions:	
     mksurfdata.pl -res usrspec -usr_gname <user_gname> -usr_gdate <user_gdate>   \ 
   [OPTIONS]
        -usr_gname "user_gname"    User resolution name to find grid file with 
                                   (only used if -res is set to 'usrspec')
        -usr_gdate "user_gdate"    User map date to find mapping files with
                                   (only used if -res is set to 'usrspec')
                                   NOTE: all mapping files are assumed to be in mkmapdata
                                    - and the user needs to have invoked mkmapdata in 
                                      that directory first
        -usr_mapdir "mapdirectory" Directory where the user-supplied mapping files are
                                   Default: ../../shared/mkmapdata

OPTIONS
     -allownofile                  Allow the script to run even if one of the input files
                                   does NOT exist.
     -crop                         Add in crop datasets
     -dinlc [or -l]                Enter the directory location for inputdata 
                                   (default /glade/p/cesm/cseg/inputdata)
     -debug [or -d]                Do not actually run -- just print out what 
                                   would happen if ran.
     -dynpft "filename"            Dynamic PFT/harvesting file to use 
                                   (rather than create it on the fly) 
                                   (must be consistent with first year)
     -glc_nec "number"             Number of glacier elevation classes to use (by default 0)
     -merge_gis                    If you want to use the glacier dataset that merges in
                                   the Greenland Ice Sheet data that CISM uses (typically
                                   used only if consistency with CISM is important)
     -hirespft                     If you want to use the high-resolution pft dataset rather 
                                   than the default lower resolution dataset
                                   (low resolution is at half-degree, high resolution at 3minute)
                                   (hires only available for present-day [2000])
     -exedir "directory"           Directory where mksurfdata_map program is
                                   (by default assume it is in the current directory)
     -inlandwet                    If you want to allow inland wetlands
     -mv                           If you want to move the files after creation to the 
                                   correct location in inputdata
                                   (by default -nomv is assumed so files are NOT moved)
     -years [or -y]                Simulation year(s) to run over (by default 1850,2000) 
                                   (can also be a simulation year range: i.e. 1850-2000)
     -help  [or -h]                Display this help.
     
     -rcp   [or -c] "rep-con-path" Representative concentration pathway(s) to use for 
                                   future scenarios 
                                   (by default -999.9, where -999.9 means historical ).
     -usrname "clm_usrdat_name"    CLM user data name to find grid file with.

      NOTE: years, res, and rcp can be comma delimited lists.


OPTIONS to override the mapping of the input gridded data with hardcoded input

     -pft_frc "list of fractions"  Comma delimited list of percentages for veg types
     -pft_idx "list of veg index"  Comma delimited veg index for each fraction
     -soil_cly "% of clay"         % of soil that is clay
     -soil_col "soil color"        Soil color (1 [light] to 20 [dark])
     -soil_fmx "soil fmax"         Soil maximum saturated fraction (0-1)
     -soil_snd "% of sand"         % of soil that is sand


To run the script with optimized mksurfdata_map for a 4x5 degree grid for 1850 conditions, on yellowstone you would do the following:

Example 2-3. Example of running mksurfdata.pl to create a 4x5 resolution fsurdat for a 1850 simulation year


> cd models/lnd/clm/tools/clm4_5/mksurfdata_map/src
> gmake
> cd ..
> ./mksurfdata.pl -y 1850 -r 4x5

Running mksurfdata_map by Hand

In the above section we show how to run mksurfdata_map through the mksurfdata.pl using input datasets that are in the build-namelist XML database. When you are running with input datasets that are NOT available in the XML database you either need to add them as outlined in Chapter 3, or you need to run mksurfdata_map by hand, as we will outline here. The easiest way to start is to use the "-debug" option to mksurfdata.pl for a case as close as possible and then customize the resulting namelist file for the datasets that you change.

Preparing your mksurfdata_map namelist

When running mksurfdata_map by hand you will need to prepare your own input namelist. There is a sample namelist setup for running on the previous NCAR machine bluefire. So you will need to change the filepaths to use that namelist. The sample namelist is called

mksurfdata_map.namelist -- standard sample namelist.
pftdyn_hist_simyr1850-2005.txt -- the mksrf_fdynuse text file with filenames for 1850-2005.

Note, that one of the inputs mksrf_fdynuse is a filename that includes the filepaths to other files. The filepaths in this file will have to be changed as well. You also need to make sure that the line lengths remain the same as the read is a formatted read, so the placement of the year in the file, must remain the same, even with the new filenames. One advantage of the mksurfdata.pl script is that it will create the mksrf_fdynuse file for you.

We list the namelist items below. Most of the namelist items are filepaths to give to the input half degree resolution datasets that you will use to scale from to the resolution of your grid dataset. You must first specify the input grid dataset for the resolution to output for:

  1. mksrf_fgrid mapping file that defines the output grid to run on

Then you must specify settings for input high resolution datafiles

  1. mksrf_ffrac land fraction and land mask dataset

  2. mksrf_fglacier Glacier dataset

  3. mksrf_flai Leaf Area Index dataset

  4. mksrf_flanwat Land water dataset

  5. mksrf_forganic Organic soil carbon dataset

  6. mksrf_fmax Max fractional saturated area dataset

  7. mksrf_fsoicol Soil color dataset

  8. mksrf_fsoitex Soil texture dataset

  9. mksrf_furbtopo Topography dataset used to limit the extent of urban regions

  10. mksrf_flndtopo Land topography dataset used for glacier multiple elevation classes

  11. mksrf_furban Urban dataset

  12. mksrf_fvegtyp PFT vegetation type dataset

  13. mksrf_fvocef Volatile Organic Compound Emission Factor dataset

  14. mksrf_fgdp GDP (Gross Domestic Product) dataset (new in CLM4.5)

  15. mksrf_fpeat Peatland dataset (new in CLM4.5)

  16. mksrf_fabm Agricultural fire peak month dataset (new in CLM4.5)

  17. mksrf_ftopostats Topography statistics dataset (new in CLM4.5)

  18. mksrf_fvic VIC parameters dataset (new in CLM4.5)

  19. mksrf_fch4 Inversion-derived CH4 parameters dataset (new in CLM4.5)

Then the list of mapping files for each of these datasets. The same mapping file can be used by multiple raw datasets (from the list above) if they are on the same grid and land-mask. Each mapping file needs to correspond to the grid and land-mask of the raw datasets above.

  1. map_fglacier mapping file for mksrf_fglacier

  2. map_flai mapping file for mksrf_flai

  3. map_flakwat mapping file for mksrf_flakwat

  4. map_forganic mapping file for mksrf_forganic

  5. map_fmax mapping file for mksrf_fmax

  6. map_fsoicol mapping file for mksrf_fsoicol

  7. map_fsoitex mapping file for mksrf_fsoitex

  8. map_furbtopo mapping file for mksrf_furbtopo

  9. map_flndtopo mapping file for mksrf_flndtopo

  10. map_fharvest mapping file for mksrf_fharvest

  11. map_fwetlnd mapping file for mksrf_fwetlnd

  12. map_furban mapping file for mksrf_furban

  13. map_fpft mapping file for mksrf_fpft

  14. map_fvocef mapping file for mksrf_fvocef

  15. map_fgdp mapping file for mksrf_fgdp

  16. map_fabm mapping file for mksrf_fabm

  17. map_ftopostats mapping file for mksrf_ftopostats

  18. map_fvic mapping file for mksrf_fvic

  19. map_fch4 mapping file for mksrf_fch4

Note: If you add new raw datasets to mksurfdata_map, you will need to add the corresponding mapping file for that dataset as well. If the file is on the same grid and land-mask as another dataset, it can share the same mapping file. If it is on a different grid and/or land-mask -- YOU WILL NEED TO CREATE MAPPING DATASETS FOR IT. And mkmapdata.sh will need to be changed to create the new mapping files (see the Section called Creating mapping files that mksurfdata_map will use. See Figure 2-6 for a visual representation of the relationship of the various input and output files for mksurfdata_map.

Figure 2-6. Details of running mksurfdata_map

Each of the raw datasets (the mksrf_* files) needs a mapping file to map from the output grid you are running on to the grid and land-mask for that dataset. Some raw datasets share the same grid and land mask -- hence they can share the same mapping file. One of the mapping files is used to specify the grid for mksrf_fgrid.

You specify the ASCII text file with the land-use files.

  1. mksrf_fdynuse "dynamic land use" for transient land-use/land-cover changes. This is an ASCII text file that lists the filepaths to files for each year and then the year it represents (note: you MUST change the filepaths inside the file when running on a machine NOT at NCAR). We always use this file, even for creating datasets of a fixed year. Also note that when using the "pft_" settings this file will be an XML-like file with settings for PFT's rather than filepaths (see the Section called Single Point options to mksurfdata_map below).

And optionally you can specify settings for:

  1. all_urban If entire area is urban (typically used for single-point urban datasets, that you want to be exclusively urban)

  2. no_inlandwet If TRUE, set wetland to 0% over land (re-normalizing other landcover types as needed); wetland will only be used for ocean points. (Only applies to CLM4.5 version of mksurfdata_map, for which the default is TRUE.)

  3. mksrf_firrig (CLM4.0 ONLY) Irrigation dataset, if you want to activate the irrigation model for CLM4.0 over generic cropland (experimental mode, normally NOT used). If this dataset is set, you also NEED to set the mapping file for the irrigation dataset with map_firrig.

  4. mksrf_gridnm Name of output grid resolution (if not set the files will be named according to the number of longitudes by latitudes)

  5. mksrf_gridtype Type of grid (default is 'global')

  6. nglcec number of glacier multiple elevation classes. Can be 0, 1, 3, 5, or 10. When using the resulting dataset with CLM you can then run with glc_nec of either 0 or this value. (experimental normally use the default of 0, when running with the land-ice model in practice only 10 has been used)

  7. numpft number of Plant Function Types (PFT) in the input vegetation mksrf_fvegtyp dataset. You change this to 20, if you want to create a dataset with prognostic crop activated. The vegetation dataset also needs to have prognostic crop types on it as well. (experimental normally not changed from the default of 16)

  8. outnc_large_files If output should be in NetCDF large file format

  9. outnc_double If output should be in double precision (normally we turn this on)

  10. pft_frc array of fractions to override PFT data with for all gridpoints (experimental mode, normally NOT used).

  11. pft_idx array of PFT indices to override PFT data with for all gridpoints (experimental mode, normally NOT used).

  12. soil_clay percent clay soil to override all gridpoints with (experimental mode, normally NOT used).

  13. soil_color Soil color to override all gridpoints with (experimental mode, normally NOT used).

  14. soil_fmax Soil maximum fraction to override all gridpoints with (experimental mode, normally NOT used).

  15. soil_sand percent sandy soil to override all gridpoints with (experimental mode, normally NOT used).

After creating your namelist, when running on a non NCAR machine you will need to get the files from the inputdata repository. In order to retrieve the files needed for mksurfdata_map you can do the following on your namelist to get the files from the inputdata repository, using the check_input_data script which also allows you to export data to your local disk.

Example 2-4. Getting the raw datasets for mksurfdata_map to your local machine using the check_input_data script


> cd models/lnd/clm/tools/clm4_5/mksurfdata_map
# First remove any quotes and copy into a filename that can be read by the
# check_input_data script
> sed "s/'//g" namelist > clm.input_data_list
# Run the script with -export and give the location of your inputdata with $CSMDATA
> ../../../../../../scripts/ccsm_utils/Tools/check_input_data -datalistdir . \
-inputdata $CSMDATA -check -export
# You must then do the same with the fdynuse file referred to in the namelist
# in this case we add a file = to the beginning of each line
> awk '{print "file = "$1}' pftdyn_hist_simyr2000-2000.txt > clm.input_data_list
# Run the script with -export and give the location of your inputdata with $CSMDATA
> ../../../../../scripts/ccsm_utils/Tools/check_input_data -datalistdir . \
-inputdata $CSMDATA -check -export

Single Point options to mksurfdata_map

The options: pft_frc, pft_idx, soil_clay, soil_color, soil_fmax, and soil_sand exist to override the values that come in on the datasets with user specified values. They override the PFT and soil values for all grid points to the given values that you set. This is useful for running with single-point tower sites where the soil type and vegetation is known. Note that when you use pft_frc, all other landunits will be zeroed out, and the sum of your pft_frc array MUST equal 100.0. Also note that when using the "pft_" options the mksrf_fdynuse file instead of having filepath's will be an XML-like file with PFT settings. Unlike the file of file-paths, you will have to create this file by hand, mksurfdata.pl will NOT be able to create it for you (other than the first year which will be set to the values entered on the command line). Note, that when PTCLM is run, it CAN create these files for you from a simpler format (see the Section called Dynamic Land-Use Change Files for use by PTCLM in Chapter 6). Instead of a filepath you have a list of XML elements that give information on the PFT's and harvesting for example:


<pft_f>100</pft_f><pft_i>1</pft_i><harv>0,0,0,0,0</harv><graz>0</graz>
So the <pft_f> tags give the PFT fractions and the <pft_i> tags give the index for that fraction. Harvest is an array of five elements, and grazing is a single value. Like the usual file each list of XML elements goes with a year, and there is limit on the number of characters that can be used.

Standard Practices when using mksurfdata_map

In this section we give the recommendations for how to use mksurfdata_map to give similar results to the files that we created when using it.

If you look at the standard surface datasets that we have created and provided for use, there are three practices that we have consistently done in each (you also see these in the sample namelists and in the mksurfdata.pl script). The first is that we always output data in double precision (hence outnc_double is set to .true.). The next is that we always use the procedure for creating transient datasets (using mksrf_fdynuse) even when creating datasets for a fixed simulation year. This is to ensure that the fixed year datasets will be consistent with the transient datasets. When this is done a "surfdata.pftdyn" dataset will be created -- but will NOT be used in CLM. If you look at the sample namelist mksurfdata_map.namelist you note that it sets mksrf_fdynuse to the file pftdyn_hist_simyr2000.txt, where the single file entered is the same PFT file used in the rest of the namelist (as mksrf_fvegtyp). The last practice that we always do is to always set mksrf_ftopo, even if glacier elevation classes are NOT active. This is important in limiting urban areas based on topographic height, and hence is important to use all the time. The glacier multiple elevation classes will be used as well if you are running a compset with the active glacier model.

There two other important practices for creating urban single point datasets. First, you will often will want to set all_urban to .true. so that the dataset will have 100% of the gridcell output as urban rather than some mix of: urban, vegetation types, and other landunits. Also note that the current urban single-point datasets have urban data that is specific for the site so this data will need to be overwritten when you create new datasets. The mksurfdata.pl script does this for you by using ncks to overwrite the data after the file is created.

The final issue is how to build mksurfdata_map. When NOT optimized mksurfdata_map is a bit slower, but not too bad. So you may want to run it optimized, which is the default. The problem with running optimized is that answers will be different when running optimized versus non-optimized for most compilers. So if you want answers to be the same as a previous surface dataset, you will need to run it on the same platform and optimization level. Note, that the output surface datasets will have attributes that describe whether the file was written out optimized or not, to enable the user to more easily try to match datasets created previously. For more information on the different compiler options for the CLM4.5 tools see the Section called Common environment variables and options used in building the FORTRAN tools.