| CESM Research Tools: CLM4 in CESM1.0.4 User's Guide Documentation | ||
|---|---|---|
| Prev | Chapter 5. How to run Single-Point/Regional cases | Next |
The file: Quickstart.userdatasets in the models/lnd/clm/doc directory gives guidelines on how to create and run with your own single-point or regional datasets. Below we reprint the above guide.
Quick-Start to using your own datasets in clm4
===============================================
Assumptions: You are already familiar with the use of the cpl7 scripts
for creating cases to run with "standalone" clm. See the
Quickstart.GUIDE and the README files and documentation in
the scripts directory for more information on this process.
We also assume that the env variable $CSMDATA points to the
location of the standard datasets for your machine
(/fis/cgd/cseg/csm/inputdata on bluefire). We also assume that the
following variables are used to point to the appropriate
values that you want to use for your case. Mask is included
as part of your resolution for your case, and SIM_YEAR and
SIM_YEAR_RANGE will be set appropriately for the particular use
case that you choose for your compset (i.e. 1850_control,
20thC_transient etc.).
SIM_YEAR -------- Simulation year (i.e. 1850, or 2000)
SIM_YEAR_RANGE -- Simulation year range (i.e. constant, or 1850-2000)
MASK ------------ Land mask (i.e. navy, USGS, or gx1v6)
Process:
0.) Why do this?
An alternative to the steps below, is to create your case, and hand-edit the
relevant namelists as appropriate with your own datasets. One reason for
the process below is so that we can do automated testing on dataset inclusion.
But, it also provides the following functionality to the user:
a.) New cases with the same datasets only require a small change to
env_conf.xml and env_run.xml (steps 5,6, and 8)
b.) You can clone new cases based on a working case, without having to
hand-edit all of the namelists for the new case in the same way.
c.) The process will check for the existence of files when cases are
configured so you can have the scripts check that datasets exist
rather than finding out at run-time after submitted to batch.
d.) The process checks for valid namelists, and makes it less likely
for you to put an error or typo in the namelists.
e.) The *.input_data_list files will be accurate for your case,
you can use the check_input_data script to do queries on the files.
f.) Your dataset names will be closer to standard names, and easier
for inclusion in standard clm (with the exception of creation dates).
g.) The regional extraction script (see 3.b below) will automatically create
files with names following this convention.
1.) Create your own dataset area -- link it to standard dataset location
Create a directory to put your own datasets (such as /ptmp/$USER/my_inputdata).
Use the script link_dirtree to link the standard datasets into this location.
If you already have complete control over the datasets in $CSMDATA -- you
can skip this step.
setenv MYCSMDATA /ptmp/$USER/my_inputdata
scripts/link_dirtree $CSMDATA $MYCSMDATA
If you do this you can find the files you've added with...
find $MYCSMDATA -type f -print
and you can find the files that are linked to the standard location with...
find $MYCSMDATA -type l -print
2.) Establish a "user dataset identifier name" string
You need a unique identifier for your datasets for a given resolution,
mask, area, simulation-year, and simulation year-range. The identifier
can be any string you want -- but we have the following suggestions:
Suggestions for global grids:
setenv MYDATAID ${degLat}x${degLon}
Suggestions for regional grids: either give the number of points in the grid
setenv MYDATAID nxmpt_citySTATE
setenv MYDATAID nxmpt_cityCOUNTRY
setenv MYDATAID nxmpt_regionCOUNTRY
setenv MYDATAID nxmpt_region
or give the total size of the gridcells
setenv MYDATAID nxmdeg_citySTATE
setenv MYDATAID nxmdeg_cityCOUNTRY
for example: setenv MYDATAID 10x15 -- global 10x15 grid
setenv MYDATAID 1x1pt_boulderCO -- single-point for Boulder CO
setenv MYDATAID 5x5pt_boulderCO -- 5x5 region around Boulder CO
setenv MYDATAID 1x1deg_boulderCO - 1x1 degree region around Boulder CO
setenv MYDATAID 13x12pt_f19_alaskaUSA1 - 13x12 gridcells from f19
(1.9x2.5) global resolution over Alaska
3.) Add your own datasets in the standard locations in that area
3.a) Create datasets using the standard tools valid for any specific points
Use the tools in models/lnd/clm/tools to create new datasets. Tools
such as: mkgriddata, mksurfdata, mkdatadomain, and the regridding tools
in ncl_scripts
(see the models/lnd/clm/bld/namelist_files/namelist_defaults_usr_files.xml
for the exact syntax for all files).
surfdata: copy files into:
$MYCSMDATA/lnd/clm2/surfdata/surfdata_${MYDATAID}_simyr${SIM_YEAR}.nc
fatmgrid: copy files into:
$MYCSMDATA/lnd/clm2/griddata/griddata_${MYDATAID}.nc
fatmlndfrc: copy files into:
$MYCSMDATA/lnd/clm2/griddata/fracdata_${MYDATAID}_${MASK}.nc
domainfile: copy files into:
$MYCSMDATA/atm/datm7/domain.clm/domain.lnd.${MYDATAID}_${MASK}.nc
3.b) Use the regional extraction script to get regional datasets from the global ones
Use the getregional_datasets.pl script to extract out regional datasets of interest.
Note, the script works on all files other than the "finidat" file as it's a 1D vector file.
For example, Run the extraction for data from 52-73 North latitude, 190-220 longitude
that creates 13x12 gridcell region from the f19 (1.9x2.5) global resolution over Alaska.
cd models/lnd/clm/tools/ncl_scripts
./getregional_datasets.pl -sw 52,190 -ne 73,220 -id $MYDATAID \
-mycsmdata $MYCSMDATA
Repeat this process if you need files for multiple sim_year, and sim_year_range values.
4.) Setup your case
Follow the standard steps for executing "scripts/create_newcase" and customize
your case as appropriate.
i.e.
./create_newcase -case my_userdataset_test -res pt1_pt1 -compset I1850 \
-mach bluefire
The above example implies that: MASK=gx1v6, SIM_YEAR=1850, and SIM_YEAR_RANGE=constant.
5.) Edit the env_run.xml in the case to point to your new dataset area
Edit DIN_LOC_ROOT_CSMDATA in env_run.xml to point to $MYCSMDATA
./xmlchange -file env_run.xml -id DIN_LOC_ROOT_CSMDATA -val $MYCSMDATA
6.) Edit the env_conf.xml in the case to point to your user dataset identifier
name.
Edit CLM_USRDAT_NAME to point to $MYDATAID
./xmlchange -file env_conf.xml -id CLM_USRDAT_NAME -val $MYDATAID
./xmlchange -file env_conf.xml -id CLM_PT1_NAME -val $MYDATAID
7.) Configure the case as normal
./configure -case
8.) Run your case as normal
|
Use the regional extraction script to get regional datasets from the global ones
The getregional_datasets.pl script to extract out regional datasets of interest.
Note, the script works on all files other than the "finidat" file as it's a 1D vector file.
The script will extract out a block of gridpoints from all the input global datasets,
and create the full suite of input datasets to run over that block. The input datasets
will be named according to the input "id" you give them and the id can then be used
as input to CLM_USRDAT_NAME to create a case that uses it. See
the section on CLM Script Configuration Items for
more information on setting CLM_USRDAT_NAME (in Chapter 1). The list of files extracted by
their name used in the namelists are:
fatmgrid, fatmlndfrc,
fsurdat, fpftdyn,
flndtopo,
stream_fldfilename_ndep, and the DATM files
domainfile, and faerdep.
For more information on these files see the Table on required files.
The alternatives to using this script are to use PTS_MODE,
discussed earlier, to use PTCLM discussed in the next chapter, or creating the files
individually using the different file creation tools (given in the
Tools Chapter). Creating
all the files individually takes quite a bit of effort and time. PTS_MODE
has some limitations as discussed earlier, but also as it uses global files, is
a bit slower when running simulations than using files that just have the set
of points you want to run over. Another advantage is that once you've created the
files using this script you can customize them if you have data on this specific
location that you can replace with what's already in these files.
The script requires the use of both "Perl" and "NCL". See the NCL Script section in the Tools Chapter on getting and using NCL and NCL scripts. The main script to use is a Perl script which will then in turn call the NCL script that actually creates the output files. The ncl script gets it's settings from environment variables set by the perl script. To get help with the script use "-help" as follows:
> cd models/lnd/clm/tools/ncl_scripts > ./getregional_datasets.pl -help |
SYNOPSIS
getregional_datasets.pl [options] Extracts out files for a single box region from the \
global
grid for the region of interest. Choose a box determined by
the NorthEast and SouthWest corners.
OPTIONS
-debug [or -d] Just debug by printing out what the script would do.
This can be useful to find the size of the output area.
-help [or -h] Print usage to STDOUT.
-mask "landmask" Type of land-mask (i.e. navy, gx3v7, gx1v6 etc.) (default gx1v6)
\
-mycsmdata "dir" Root directory of where to put your csmdata.
(default /home/erik/inputdata or value of CSMDATA env variable)
-mydataid "name" [or -id] Your name for the region that will be extracted. \
(REQUIRED)
Recommended name: grid-size_global-resolution_location \
(?x?pt_f??_????)
(i.e. 12x13pt_f19_alaskaUSA for 12x13 grid cells from the f19 \
global resolution over Alaska)
-NE_corner "lat,lon" [or -ne] North East corner latitude and longitude \
(REQUIRED)
-nomv Do NOT move datasets to final location, just leave them in \
current directory
-res "resolution" Global horizontal resolution to extract data from (default \
1.9x2.5).
-rcp "pathway" Representative concentration pathway for future scenarios
Only used when simulation year range ends in a future
year, such as 2100.
(default -999.9).
-sim_year "year" Year to simulate for input datasets (i.e. 1850, 2000) (default \
2000)
(default 2000)
-sim_yr_rng "year-range" Range of years for transient simulations
(i.e. 1850-2000, 1850-2100, or constant) (default constant)
-SW_corner "lat,lon" [or -sw] South West corner latitude and longitude \
(REQUIRED)
-verbose [or -v] Make output more verbose.
|
The required options are: -id,
-ne, and -se, for the output identifier
name to use in the filenames, latitude and longitude of the Northeast corner, and
latitude and longitude of the SouthEast corner (in degrees). Options that specify
which files will be used are: -mask, -res,
-rcp, -sim_year, and -sim_yr_rng
for the land-mask to use, global resolution name, representative concentration pathway
for future scenarios, simulation year, and simulation year range. The location of the
input and output files will be determined by the option -mycsmdata
(can also be set by using the environment variable $CSMDATA). If
you are running on a machine like at NCAR where you do NOT have write permission
to the CESM inputdata files, you should use the scripts/link_dirtree
script to create soft-links of the original files to a location that you can write
to. This way you can use both your new files you created as well as the original
files and use them from the same location.
The remaining options to the script are -debug,
and -verbose. -debug is used to show what
would happen if the script was run, without creating the actual files.
-verbose adds extra log output while creating the files so you
can more easily see what the script is doing.
For example, Run the extraction for data from 52-73 North latitude, 190-220 longitude that creates 13x12 gridcell region from the f19 (1.9x2.5) global resolution over Alaska.
Example 5-4. Example of running getregional_datasets.pl to get datasets for a specific region over Alaska
> cd scripts # First make sure you have a inputdata location that you can write to # You only need to do this step once, so you won't need to do this in the future > setenv MYCSMDATA $HOME/inputdata # Set env var for the directory for input data > ./link_dirtree $CSMDATA $MYCSMDATA > cd ../models/lnd/clm/tools/ncl_scripts > ./getregional_datasets.pl -sw 52,190 -ne 73,220 -id 13x12pt_f19_alaskaUSA -mycsmdata $MYCSMDATA |
| Warning |
See the Section called Warning about Running with a Single-Processor on a Batch Machine for a warning about running single-point jobs on batch machines. |
Note: See the Section called Managing Your Own Data-files in Chapter 3 for notes about managing your data when using link_dirtree.
Now to run a simulation with the datasets created above, you create a single-point
case, and set CLM_USRDAT_NAME to the identifier used above. Note that in the example below
we set the number of processors to use to one (-pecount 1). For a single point, you
should only use a single processor, but for a regional grid, such as the example below
you could use up to the number of grid points (12x13=156 processors.
Example 5-5. Example of using CLM_USRDAT_NAME to run a simulation using user datasets for a
specific region over Alaska
> cd scripts # Create the case and set it to only use one processor > ./create_newcase -case my_userdataset_test -res pt1_pt1 -compset I1850 \ -mach bluefire > cd my_userdataset_test/ > ./xmlchange -file env_run.xml -id DIN_LOC_ROOT_CSMDATA -val $MYCSMDATA > ./xmlchange -file env_conf.xml -id |