Postprocessing ExoPlaSim Outputs

The Basics: Formats, Variables, and Math

As of ExoPlaSim 3.0.0, postprocessing can be done using the exoplasim.pyburn module. This module exposes an API for setting the variables to be included in postprocessed output, the horizontal mode in which to present them, and any additional math that should be performed, including coordinate transformations, time-averaging, and standard deviations. pyburn also supports a large range of output formats: netCDF, HDF5, NumPy’s compressed .npz archives (the default), and plain-text comma-separated value (CSV) files. The latter can be compressed individually with the gzip format, tarballed, or tarballed and compressed (in the latter case with gzip, lzma, or bzip2 compression types). Producing netCDF files requires that the netCDF4 python library be present (you can install it with pip install netCDF4 or at ExoPlaSim’s install-time with pip install exoplasim[netCDF4]). Similarly, producing HDF5 files requires the presence of the h5py Python library, which can be installed via pip install h5py or, at ExoPlaSim’s install time, with pip install exoplasim[HDF5]. Support for both netCDF and HDF5 can be guaranteed at install-time by combining them:

pip install exoplasim[netCDF4,HDF5]

Format

The choice of output format can be specified either when the postprocessor is called (if being used manually), or as an argument to a Model object, by simply providing the file extension:

Format

Supported Extensions

NumPy (default)

.npz

.npy

netCDF

.nc

HDF5

.h5

.he5

.hdf5

Compressed CSV

.gz

.tar.gz

.tar.xz

.tar.bz2

Uncompressed CSV

.csv

.txt

.tar

Because the NumPy archive format does not support additional metadata arrays, metadata is stored separately in a file using the _metadata.npz suffix. This file is typically a few tens of kiB.

CSV-type files will only contain 2D variable information, so the first N-1 dimensions will be flattened. The original variable shape is included in the file header (prepended with a # character) as the first items in a comma-separated list, with the first non-dimension item given as the ‘|||’ placeholder. On reading variables from these files, they should be reshaped according to these dimensions. This is true even in tarballs (which contain CSV files). If read in by gcmt.load(), this reshaping will be done automatically.

Note that when using the pyburn.postprocess() function directly, a single file must be specified as the output file. This is true even for formats that produce a large number of files that don’t get bound up together, such as .gz and .csv, which produce a folder containing one file per variable. The file you specify should have the pattern <subdirectory>.<extension>. This file will not actually be created, but it will be parsed to determine the desired output format. So, for example, to create an archive consisting of a folder full of CSV files for the raw output file MOST.00127, one would use MOST.00127.csv. The surface temperature variable, ts, would then be found in MOST.00127/MOST.00127_ts.csv. This same combined-format fictional filestring can be passed to gcmt.load(). The object returned by that function will access the data in the archive just as if it were a bound archive, such as a tarball, netCDF file, or HDF5 file.

A T21 model output with 10 vertical levels, 12 output times, all supported variables in grid mode,and no standard deviation computation will have the following sizes for each format:

Format

Size

netCDF

12.8 MiB

HDF5

17.2 MiB

NumPy (default)

19.3 MiB

tar.xz

33.6 MiB

tar.bz2

36.8 MiB

gzipped

45.9 MiB

uncompressed

160.2 MiB

Variables

Output variables can be chosen in multiple ways. Either a burn7-style namelist can be provided, containing a list of numeric variables codes (listed below), or a list can be passed directly, containing a list of numeri codes, a list of strings of numeric codes, or a list of string variable keys, as indicated in the leftmost-column of the table below.

Variable lists can be specified once for all outputs of a given type (‘regular’, ‘snapshot’, or ‘highcadence’), with Model.cfgpostprocessor(), or for each model year with Model.postprocess(), or manually outside of the ExoPlaSim Model object, with pyburn.postprocess.

Optionally, as advanced usage, a dictionary can be passed, with one member per variable (using the same identification rules given above), and pyburn.dataset() keyword arguments specified for each variable. For example, to create an output file with two variables, surface temperature and streamfunction, both on a horizontal grid, and the streamfunction zonally-averaged and passed through physics filters:

{"ts":{"mode":"grid","zonal":False},
 "stf":{"mode":"grid","zonal":True,"physfilter":True}}

This can be specified in one of 3 ways. Either it can be set for all outputs of a given type (‘regular’, ‘snapshot’, or ‘highcadence’) as a Model property:

>>> myModel.cfgpostprocessor(ftype="regular",extension=".nc",
>>>                          variables={"ts":{"mode":"grid","zonal":False},
>>>                                     "stf":{"mode":"grid","zonal":True,"physfilter":True}})

Or it can be set each time Model.postprocess() is called:

>>> myModel.postprocess("MOST.00127",
>>>                     {"ts":{"mode":"grid","zonal":False},
>>>                      "stf":{"mode":"grid","zonal":True,"physfilter":True}},
>>>                     log="burnlog.00127",crashifbroken=True)

Or, finally, it can be specified directly to pyburn.postprocess():

>>> pyburn.postprocess("MOST.00127","MOST.00127.nc",logfile="burnlog.00127",
>>>                    variables={"ts":{"mode":"grid","zonal":False},
>>>                               "stf":{"mode":"grid","zonal":True,"physfilter":True}})

Postprocessing Math

pyburn provides the ability to perform various mathematical operations on the data as part of the postprocessing step.

Multiple horizontal modes are available (specified with the mode keyword), including a Gaussian-spaced latitude-longitude grid ("grid"), spherical harmonics ("spectral"), Fourier coefficients for each latitude ("fourier"), a latitude-longitude grid rotated such that the “North” pole is at the substellar point of a sychronously-rotating planet, and the “equator” is the terminator ("synchronous"), and Fourier coefficients computed along lines of constant longitude (including the mirror component on the opposite hemisphere) in that rotated coordinate system ("syncfourier"). Additionally, for output modes with discrete latitudes, variables can be zonally-averaged (zonal=True).

ExoPlaSim performs some time-averaging on the fly (for “regular”-type outputs) to avoid overloading I/O buffers and creating enormous raw output files, but the number of output times is still often going to be more than you prefer for the postprocessed output data. The default configuration, for example, produces 72 output timestamps per year. pyburn can perform time-averaging to reduce this to e.g. monthly output, via the times keyword and the timeaveraging keyword. The former specifies either the number of output times or the specific output times requested (as decimal fractions of a model output’s timeseries), while the latter is a boolean True/False flag. If specific output times are requested or the number of requested outputs doesn’t divide cleanly into the number of timestamps in the raw output, pyburn can interpolate between timestamps using linear interpolation. No extrapolation is performed, so you cannot request a time between e.g. the last output of the previous year and the first output of the current year. Whether or not linear interpolation is used or “nearest-neighbor” interpolation (which simply selects the timestamp closest to the target time) can be set with the interpolatetimes keyword–if True–linear interpolation will be used when necessary. The minimum number of timestamps in the output file is 1; this corresponds to computing an annual average.

Finally, pyburn brings the ability to compute the standard deviations of ExoPlaSim variables. Enabling this with stdev=True will compute the standard deviation in one of two ways, depending on whether time-averaging is being used. If time averages are being computed, then a standard deviation will be computed alongside each average, and the each standard deviation variable (denoted with the _std suffix in the variable name, e.g. ts_std for the standard deviation of surface temperature) will have the same number of timestamps as the time-averages. If time-averages are not being computed, then the standard deviation of the entire file’s timeseries will be computed, and there will be one timestamp per standard deviation variable.

Reading Postprocessed Files

While postprocessed files are portable and can be read however you like, ExoPlaSim also provides a native, format-agnostic way to access them via the gcmt.load() function. This takes the archive filename as its argument, and returns an object analogous to an open netCDF file object. It has two members of interest to the user: variables and metadata. Both are compatible with all dictionary methods, and individual variables’ data can be extracted by using the variable name as the dictionary key. For example:

>>> import exoplasim.gcmt as gcmt
>>> myData = gcmt.load("MOST.0127.tar.gz")
>>> surfacetemperature = myData.variables['ts']
>>> surftemp_metadata  = myData.metadata['ts']

Note that for CSV-type formats, like the tarball given above, the file is left compressed (except during the initial read), and the whole dataset is not loaded into memory. Dimension arrays, such as latitude, longitude, etc, are loaded, as is all metadata. By default, however, only one data array will be loaded into memory. This can be expanded with the csvbuffersize keyword, which takes the number of variables to permit to hold in the memory buffer. This buffer uses a first-in, first-out approach, so if a new variable is requested and the buffer is full, the loaded variable which was used the least recently will be purged from memory.

Postprocessor Variable Codes

Note that in addition to the variable codes listed below, if pyburn is used with stdev=True, there will also be variables that correspond to those listed below, with the _std suffix. If time-averaging was performed during postprocessing, the standard deviation will be the standard deviation for each averaged time period, and there will be the same number of timestamps for the _std variables as for their nominal data counterparts. If time-averaging was not used, then each standard deviation variable will have only one timestamp, corresponding to the standard deviation throughout the entire timeseries present in the file.

Variable

Code

Description

Units

Notes

nu

50

orbital true anomaly

deg

lambda

51

solar ecliptic longitude

deg

zdec

52

solar declination angle

deg

rdist

53

planet-star distance modulus

nondimensional

mld

110

mixed layer depth

m

sg

129

surface geopotential

m2 s-2

ta

130

air temperature

K

ua

131

eastward wind

m s-1

va

132

northward wind

m s-1

hus

133

specific humidity

kg/kg

ps

134

surface air pressure

hPa

wap

135

vertical air velocity

Pa s-1

wa

137

upward wind

m s-1

zeta

138

atm relative vorticity

s-1

ts

139

surface temperature

K

mrso

140

lwe of soil moisture content

m

snd

141

surface snow thickness

m

prl

142

lwe of large scale precipitation

m s-1

prc

143

convective precipitation rate

m s-1

prsn

144

lwe of snowfall amount

m s-1

bld

145

dissipation in boundary layer

W m-2

hfss

146

surface sensible heat flux

W m-2

hfls

147

surface latent heat flux

W m-2

stf

148

streamfunction

m2 s-2

psi

149

velocity potential

m2 s-2

psl

151

air pressure at sea level

hPa

pl

152

log surface pressure

nondimensional

d

155

divergence of wind

s-1

zg

156

geopotential height

m

hur

157

relative humidity

nondimensional

tps

158

tendency of surface air pressure

Pa s-1

u3

159

u*

m3 s-3

mrro

160

surface runoff

m s-1

clw

161

liquid water content

nondimensional

cl

162

cloud area fraction in layer

nondimensional

tcc

163

total cloud cover

nondimensional

clt

164

cloud area fraction

nondimensional

uas

165

eastward wind 10m

m s-1

vas

166

northward wind 10m

m s-1

tas

167

air temperature 2m

K

td2m

168

dew point temperature 2m

K

tsa

169

surface temperature accumulated

K

tsod

170

deep soil temperature

K

dsw

171

deep soil wetness

nondimensional

lsm

172

land binary mask

nondimensional

z0

173

surface roughness length

m

alb

174

surface albedo

nondimensional

as

175

surface albedo

nondimensional

rss

176

surface net shortwave flux

W m-2

rls

177

surface net longwave flux

W m-2

rst

178

toa net shortwave flux

W m-2

rlut

179

toa net longwave flux

W m-2

tauu

180

surface eastward stress

Pa

tauv

181

surface northward stress

Pa

evap

182

lwe of water evaporation

m s-1

tso

183

climate deep soil temperature

K

wsoi

184

climate deep soil wetness

nondimensional

vegc

199

vegetation cover

nondimensional

rsut

203

toa outgoing shortwave flux

W m-2

ssru

204

surface solar radiation upward

W m-2

stru

205

surface thermal radiation upward

W m-2

tso2

207

soil temperature level 2

K

tso3

208

soil temperature level 3

K

tso4

209

soil temperature level 4

K

sic

210

sea ice cover

nondimensional

sit

211

sea ice thickness

m

vegf

212

forest cover

nondimensional

snm

218

snow melt

m s-1

sndc

221

snow depth change

m s-1

prw

230

atmosphere water vapor content

kg m-2

glac

232

glacier cover

nondimensional

tsn

238

snow temperature

K

spd

259

wind speed

m s-1

pr

260

total precipitation

m s-1

ntr

261

net top radiation

W m-2

nbr

262

net bottom radiation

W m-2

hfns

263

surface downward heat flux

W m-2

wfn

264

net water flux

m s-1

lwth

266

local weathering

W earth

grnz

267

ground geopotential

m2 s-2

icez

301

glacier geopotential

m2 s-2

netz

302

net geopotential

m2 s-2

dpdx

273

d(ps)/dx

Pa m-1

dpdy

274

d(ps)/dy

Pa m-1

hlpr

277

half level pressure

Pa

flpr

278

full level pressure

Pa

thetah

279

half level potential temperature

K

theta

280

full level potential temperature

K

czen

318

cosine solar zenith angle

nondimensional

wthpr

319

weatherable precipitation

mm day-1

mint

320

minimum temperature

K

maxt

321

maximum temperature

K

cape

322

convective available potential energy

J kg-1

Storm Clim.

lnb

323

level of neutral buoyancy

hPa

Storm Clim.

sdef

324

troposphere entropy deficit

nondimensional

Storm Clim.

absz

325

sigma-0.85 abs vorticity

s-1

Storm Clim.

umax

326

maximum potential intensity

m s-1

Storm Clim.

vent

327

ventilation index

nondimensional

Storm Clim.

vrumax

328

ventilation-reduced maximum wind

m s-1

Storm Clim.

gpi

329

genesis potential index

nondimensional

Storm Clim.

dfu

404

shortwave up

W m-2

Snapshot Only

dfd

405

shortwave down

W m-2

Snapshot Only

dftu

406

longwave up

W m-2

Snapshot Only

dftd

407

longwave down

W m-2

Snapshot Only

dtdt

408

radiative heating rate

K s-1

Snapshot Only

dfdz

409

flux convergence

W m-3

Snapshot Only

mmr

410

aerosol mass mixing ratio

kg kg-1

Aerosols

nrho

411

aerosol number density

particles m-3

Aerosols

Burn7 Postprocessor Options

The C++ burn7 postprocessor is now deprecated and unsupported. It is only available via the exoplasim-legacy package.