Postprocessing ExoPlaSim Outputs

The Basics: Formats, Variables, and Math

As of ExoPlaSim 3.0.0, postprocessing can be done using the exoplasim.pyburn module. This module exposes an API for setting the variables to be included in postprocessed output, the horizontal mode in which to present them, and any additional math that should be performed, including coordinate transformations, time-averaging, and standard deviations. pyburn also supports a large range of output formats: netCDF, HDF5, NumPy’s compressed .npz archives (the default), and plain-text comma-separated value (CSV) files. The latter can be compressed individually with the gzip format, tarballed, or tarballed and compressed (in the latter case with gzip, lzma, or bzip2 compression types). Producing netCDF files requires that the netCDF4 python library be present (you can install it with pip install netCDF4 or at ExoPlaSim’s install-time with pip install exoplasim[netCDF4]). Similarly, producing HDF5 files requires the presence of the h5py Python library, which can be installed via pip install h5py or, at ExoPlaSim’s install time, with pip install exoplasim[HDF5]. Support for both netCDF and HDF5 can be guaranteed at install-time by combining them:

pip install exoplasim[netCDF4,HDF5]

Format

The choice of output format can be specified either when the postprocessor is called (if being used manually), or as an argument to a Model object, by simply providing the file extension:

Format

Supported Extensions

NumPy (default)

.npz

.npy

netCDF

.nc

HDF5

.h5

.he5

.hdf5

Compressed CSV

.gz

.tar.gz

.tar.xz

.tar.bz2

Uncompressed CSV

.csv

.txt

.tar

Because the NumPy archive format does not support additional metadata arrays, metadata is stored separately in a file using the _metadata.npz suffix. This file is typically a few tens of kiB.

CSV-type files will only contain 2D variable information, so the first N-1 dimensions will be flattened. The original variable shape is included in the file header (prepended with a # character) as the first items in a comma-separated list, with the first non-dimension item given as the ‘|||’ placeholder. On reading variables from these files, they should be reshaped according to these dimensions. This is true even in tarballs (which contain CSV files). If read in by gcmt.load(), this reshaping will be done automatically.

Note that when using the pyburn.postprocess() function directly, a single file must be specified as the output file. This is true even for formats that produce a large number of files that don’t get bound up together, such as .gz and .csv, which produce a folder containing one file per variable. The file you specify should have the pattern <subdirectory>.<extension>. This file will not actually be created, but it will be parsed to determine the desired output format. So, for example, to create an archive consisting of a folder full of CSV files for the raw output file MOST.00127, one would use MOST.00127.csv. The surface temperature variable, ts, would then be found in MOST.00127/MOST.00127_ts.csv. This same combined-format fictional filestring can be passed to gcmt.load(). The object returned by that function will access the data in the archive just as if it were a bound archive, such as a tarball, netCDF file, or HDF5 file.

A T21 model output with 10 vertical levels, 12 output times, all supported variables in grid mode,and no standard deviation computation will have the following sizes for each format:

Format

Size

netCDF

12.8 MiB

HDF5

17.2 MiB

NumPy (default)

19.3 MiB

tar.xz

33.6 MiB

tar.bz2

36.8 MiB

gzipped

45.9 MiB

uncompressed

160.2 MiB

Variables

Output variables can be chosen in multiple ways. Either a burn7-style namelist can be provided, containing a list of numeric variables codes (listed below), or a list can be passed directly, containing a list of numeri codes, a list of strings of numeric codes, or a list of string variable keys, as indicated in the leftmost-column of the table below.

Variable lists can be specified once for all outputs of a given type (‘regular’, ‘snapshot’, or ‘highcadence’), with Model.cfgpostprocessor(), or for each model year with Model.postprocess(), or manually outside of the ExoPlaSim Model object, with pyburn.postprocess.

Optionally, as advanced usage, a dictionary can be passed, with one member per variable (using the same identification rules given above), and pyburn.dataset() keyword arguments specified for each variable. For example, to create an output file with two variables, surface temperature and streamfunction, both on a horizontal grid, and the streamfunction zonally-averaged and passed through physics filters:

{"ts":{"mode":"grid","zonal":False},
 "stf":{"mode":"grid","zonal":True,"physfilter":True}}

This can be specified in one of 3 ways. Either it can be set for all outputs of a given type (‘regular’, ‘snapshot’, or ‘highcadence’) as a Model property:

>>> myModel.cfgpostprocessor(ftype="regular",extension=".nc",
>>>                          variables={"ts":{"mode":"grid","zonal":False},
>>>                                     "stf":{"mode":"grid","zonal":True,"physfilter":True}})

Or it can be set each time Model.postprocess() is called:

>>> myModel.postprocess("MOST.00127",
>>>                     {"ts":{"mode":"grid","zonal":False},
>>>                      "stf":{"mode":"grid","zonal":True,"physfilter":True}},
>>>                     log="burnlog.00127",crashifbroken=True)

Or, finally, it can be specified directly to pyburn.postprocess():

>>> pyburn.postprocess("MOST.00127","MOST.00127.nc",logfile="burnlog.00127",
>>>                    variables={"ts":{"mode":"grid","zonal":False},
>>>                               "stf":{"mode":"grid","zonal":True,"physfilter":True}})

Postprocessing Math

pyburn provides the ability to perform various mathematical operations on the data as part of the postprocessing step.

Multiple horizontal modes are available (specified with the mode keyword), including a Gaussian-spaced latitude-longitude grid ("grid"), spherical harmonics ("spectral"), Fourier coefficients for each latitude ("fourier"), a latitude-longitude grid rotated such that the “North” pole is at the substellar point of a sychronously-rotating planet, and the “equator” is the terminator ("synchronous"), and Fourier coefficients computed along lines of constant longitude (including the mirror component on the opposite hemisphere) in that rotated coordinate system ("syncfourier"). Additionally, for output modes with discrete latitudes, variables can be zonally-averaged (zonal=True).

ExoPlaSim performs some time-averaging on the fly (for “regular”-type outputs) to avoid overloading I/O buffers and creating enormous raw output files, but the number of output times is still often going to be more than you prefer for the postprocessed output data. The default configuration, for example, produces 72 output timestamps per year. pyburn can perform time-averaging to reduce this to e.g. monthly output, via the times keyword and the timeaveraging keyword. The former specifies either the number of output times or the specific output times requested (as decimal fractions of a model output’s timeseries), while the latter is a boolean True/False flag. If specific output times are requested or the number of requested outputs doesn’t divide cleanly into the number of timestamps in the raw output, pyburn can interpolate between timestamps using linear interpolation. No extrapolation is performed, so you cannot request a time between e.g. the last output of the previous year and the first output of the current year. Whether or not linear interpolation is used or “nearest-neighbor” interpolation (which simply selects the timestamp closest to the target time) can be set with the interpolatetimes keyword–if True–linear interpolation will be used when necessary. The minimum number of timestamps in the output file is 1; this corresponds to computing an annual average.

Finally, pyburn brings the ability to compute the standard deviations of ExoPlaSim variables. Enabling this with stdev=True will compute the standard deviation in one of two ways, depending on whether time-averaging is being used. If time averages are being computed, then a standard deviation will be computed alongside each average, and the each standard deviation variable (denoted with the _std suffix in the variable name, e.g. ts_std for the standard deviation of surface temperature) will have the same number of timestamps as the time-averages. If time-averages are not being computed, then the standard deviation of the entire file’s timeseries will be computed, and there will be one timestamp per standard deviation variable.

Reading Postprocessed Files

While postprocessed files are portable and can be read however you like, ExoPlaSim also provides a native, format-agnostic way to access them via the gcmt.load() function. This takes the archive filename as its argument, and returns an object analogous to an open netCDF file object. It has two members of interest to the user: variables and metadata. Both are compatible with all dictionary methods, and individual variables’ data can be extracted by using the variable name as the dictionary key. For example:

>>> import exoplasim.gcmt as gcmt
>>> myData = gcmt.load("MOST.0127.tar.gz")
>>> surfacetemperature = myData.variables['ts']
>>> surftemp_metadata  = myData.metadata['ts']

Note that for CSV-type formats, like the tarball given above, the file is left compressed (except during the initial read), and the whole dataset is not loaded into memory. Dimension arrays, such as latitude, longitude, etc, are loaded, as is all metadata. By default, however, only one data array will be loaded into memory. This can be expanded with the csvbuffersize keyword, which takes the number of variables to permit to hold in the memory buffer. This buffer uses a first-in, first-out approach, so if a new variable is requested and the buffer is full, the loaded variable which was used the least recently will be purged from memory.

Postprocessor Variable Codes

Note that in addition to the variable codes listed below, if pyburn is used with stdev=True, there will also be variables that correspond to those listed below, with the _std suffix. If time-averaging was performed during postprocessing, the standard deviation will be the standard deviation for each averaged time period, and there will be the same number of timestamps for the _std variables as for their nominal data counterparts. If time-averaging was not used, then each standard deviation variable will have only one timestamp, corresponding to the standard deviation throughout the entire timeseries present in the file.

Variable	Code	Description	Units	Notes
nu	50	orbital true anomaly	deg
lambda	51	solar ecliptic longitude	deg
zdec	52	solar declination angle	deg
rdist	53	planet-star distance modulus	nondimensional
mld	110	mixed layer depth	m
sg	129	surface geopotential	m² s^-2
ta	130	air temperature	K
ua	131	eastward wind	m s^-1
va	132	northward wind	m s^-1
hus	133	specific humidity	kg/kg
ps	134	surface air pressure	hPa
wap	135	vertical air velocity	Pa s-1
wa	137	upward wind	m s^-1
zeta	138	atm relative vorticity	s^-1
ts	139	surface temperature	K
mrso	140	lwe of soil moisture content	m
snd	141	surface snow thickness	m
prl	142	lwe of large scale precipitation	m s^-1
prc	143	convective precipitation rate	m s^-1
prsn	144	lwe of snowfall amount	m s^-1
bld	145	dissipation in boundary layer	W m^-2
hfss	146	surface sensible heat flux	W m^-2
hfls	147	surface latent heat flux	W m^-2
stf	148	streamfunction	m² s^-2
psi	149	velocity potential	m² s^-2
psl	151	air pressure at sea level	hPa
pl	152	log surface pressure	nondimensional
d	155	divergence of wind	s^-1
zg	156	geopotential height	m
hur	157	relative humidity	nondimensional
tps	158	tendency of surface air pressure	Pa s-1
u3	159	u*	m³ s^-3
mrro	160	surface runoff	m s^-1
clw	161	liquid water content	nondimensional
cl	162	cloud area fraction in layer	nondimensional
tcc	163	total cloud cover	nondimensional
clt	164	cloud area fraction	nondimensional
uas	165	eastward wind 10m	m s^-1
vas	166	northward wind 10m	m s^-1
tas	167	air temperature 2m	K
td2m	168	dew point temperature 2m	K
tsa	169	surface temperature accumulated	K
tsod	170	deep soil temperature	K
dsw	171	deep soil wetness	nondimensional
lsm	172	land binary mask	nondimensional
z0	173	surface roughness length	m
alb	174	surface albedo	nondimensional
as	175	surface albedo	nondimensional
rss	176	surface net shortwave flux	W m^-2
rls	177	surface net longwave flux	W m^-2
rst	178	toa net shortwave flux	W m^-2
rlut	179	toa net longwave flux	W m^-2
tauu	180	surface eastward stress	Pa
tauv	181	surface northward stress	Pa
evap	182	lwe of water evaporation	m s^-1
tso	183	climate deep soil temperature	K
wsoi	184	climate deep soil wetness	nondimensional
vegc	199	vegetation cover	nondimensional
rsut	203	toa outgoing shortwave flux	W m^-2
ssru	204	surface solar radiation upward	W m^-2
stru	205	surface thermal radiation upward	W m^-2
tso2	207	soil temperature level 2	K
tso3	208	soil temperature level 3	K
tso4	209	soil temperature level 4	K
sic	210	sea ice cover	nondimensional
sit	211	sea ice thickness	m
vegf	212	forest cover	nondimensional
snm	218	snow melt	m s^-1
sndc	221	snow depth change	m s^-1
prw	230	atmosphere water vapor content	kg m^-2
glac	232	glacier cover	nondimensional
tsn	238	snow temperature	K
spd	259	wind speed	m s^-1
pr	260	total precipitation	m s^-1
ntr	261	net top radiation	W m^-2
nbr	262	net bottom radiation	W m^-2
hfns	263	surface downward heat flux	W m^-2
wfn	264	net water flux	m s^-1
lwth	266	local weathering	W earth
grnz	267	ground geopotential	m² s^-2
icez	301	glacier geopotential	m² s^-2
netz	302	net geopotential	m² s^-2
dpdx	273	d(ps)/dx	Pa m^-1
dpdy	274	d(ps)/dy	Pa m^-1
hlpr	277	half level pressure	Pa
flpr	278	full level pressure	Pa
thetah	279	half level potential temperature	K
theta	280	full level potential temperature	K
czen	318	cosine solar zenith angle	nondimensional
wthpr	319	weatherable precipitation	mm day^-1
mint	320	minimum temperature	K
maxt	321	maximum temperature	K
cape	322	convective available potential energy	J kg^-1	Storm Clim.
lnb	323	level of neutral buoyancy	hPa	Storm Clim.
sdef	324	troposphere entropy deficit	nondimensional	Storm Clim.
absz	325	sigma-0.85 abs vorticity	s^-1	Storm Clim.
umax	326	maximum potential intensity	m s^-1	Storm Clim.
vent	327	ventilation index	nondimensional	Storm Clim.
vrumax	328	ventilation-reduced maximum wind	m s^-1	Storm Clim.
gpi	329	genesis potential index	nondimensional	Storm Clim.
dfu	404	shortwave up	W m^-2	Snapshot Only
dfd	405	shortwave down	W m^-2	Snapshot Only
dftu	406	longwave up	W m^-2	Snapshot Only
dftd	407	longwave down	W m^-2	Snapshot Only
dtdt	408	radiative heating rate	K s^-1	Snapshot Only
dfdz	409	flux convergence	W m^-3	Snapshot Only
mmr	410	aerosol mass mixing ratio	kg kg^-1	Aerosols
nrho	411	aerosol number density	particles m^-3	Aerosols

Burn7 Postprocessor Options

The C++ burn7 postprocessor is now deprecated and unsupported. It is only available via the exoplasim-legacy package.