hotrstuff

library(hotrstuff)
library(terra)

variable = "tos" # Variable chosen is sea surface salinity

This vignette shows how you can download, wrangle, and process ESM data to calculate climate metrics. The idea is that the functions are independent of each other and can still perform their purpose without having to follow the entire workflow detailed below, from start to finish.

This code works for both monthly (mon) and daily (day) climate data.

I would recommend creating a data folder, with the following basic structure:

data/raw/: raw, untouched data
data/raw/wget: downloaded wget scripts
data/raw/tos: raw ESM data (e.g., tos - sea surface temperature)
data/proc/: processed ESM data
data/proc/merged: merged files per model - 1 model per model/variable/scenario/frequency combination
data/proc/sliced: sliced files depending on the required time period
data/proc/yearly: yearly data (after converting monthly frequency to yearly frequency)
data/proc/regridded: regridded data, standardizing grids of data
data/proc/ensemble: ensemble mean/median for each variable/scenario/frequency combination

We will follow this folder structure throughout the vignette. It will still work with a different folder structure, just make sure you’re inputting the right paths.

Downloading ESM outputs

To download ESMs, you’d still have to download the wget scripts from your chosen CMIP6 repository. I would suggest using: https://aims2.llnl.gov/search.

If you want to follow along, use the wget scripts that are in the data/raw/wget folder. But, you can use other wget scripts; just make sure to change the arguments.

Then, run the download_ESM function to run all the wget scripts found in that folder.

# base_dir <- "." # Just start relative to this location
# 
# htr_download_ESM(
#   indir = file.path(base_dir, "data", "raw", "wget"), # input directory
#   outdir = file.path(base_dir, "data", "raw", variable) # output directory
# )

Shift years

Most CMIP models have standardized naming systems. However, some OMIP model outputs have differences in reporting the temporal aspect of their simulations. OMIP runs are 6 cycles of simulations using the 61-year JRA55-do forcing data set (1958-2018), resulting in 366-year model outputs. So, we need to shift the years 1652 years forward for some of the models (e.g., starting at 0001) to standardize across models.

# htr_shift_years(indir = file.path(base_dir,"data", "raw", "omip", variable),
#                 outdir = file.path(base_dir, "data", "proc", "shifted", "omip", variable),
#                 adjust_value = 1652)

Merging files

We now merge files by model-variable-scenario-frequency combination.

# htr_merge_files(
#   indir = file.path(base_dir, "data", "raw", variable), # input directory
#   outdir = file.path(base_dir, "data", "proc", "merged", variable), # output directory
#   year_start = 1956, # earliest year across all the scenarios considered (e.g., historical, ssp126, ssp245, ssp585)
#   year_end = 1981 # latest year across all the scenarios considered
# )

So this would result in just 1 .nc file for each model-variable-scenario-frequency combination.

Adjust and reframe time periods

We now want to only keep the years we’re interested in. For example, we want to make sure that for future projections, we only want to look at 2020-2100.

# htr_slice_period(
#   indir = file.path(base_dir, "data", "proc", "merged", variable), # input directory
#   outdir = file.path(base_dir, "data", "proc", "sliced", variable), # output directory
#   freq = "Omon", # ocean, daily
#   scenario = "ssp",
#   year_start = 2020,
#   year_end = 2100,
#   overwrite = FALSE
# )

Fix calendar periods (if needed)

This function fixes the calendar days, for days with leap years. This is to standardize the calendar across the different models.

# htr_fix_calendar(indir = file.path(base_dir, "data", "proc", "sliced", variable)) # will be rewritten

Changing frequency of climate data

I created two functions: i) changes frequency to monthly (monthly_frequency()), and ii) changes frequency to yearly (yearly_frequency()), but because I’ve started with monthly data here, I didn’t run monthly_frequency() here (but it works).

# htr_change_freq(
#   freq = "monthly",
#   indir = file.path(base_dir, "data", "proc", "sliced", variable), # input directory
#   outdir = file.path(base_dir, "data", "proc", "yearly", variable)
# )

Uncomment the code above if you’re using daily data and changing it to a monthly frequency.

# htr_change_freq(
#   freq = "yearly",
#   indir = file.path(base_dir, "data", "proc", "sliced", variable), # input directory
#   outdir = file.path(base_dir, "data", "proc", "yearly", variable)
# )

This should create a new file that has a yearly frequency. For the rest of the analyses, I’ll only continue with the yearly data.

Regridding

The climate models have different grids, so now we need to regrid and standardize the grid. This requires a base, empty raster that will be made automatically in the regrid_esm().

# htr_regrid_esm(
#   indir = file.path(base_dir, "data", "proc", "yearly", variable),
#   outdir = file.path(base_dir, "data", "proc", "regridded", "yearly", variable),
#   cell_res = 0.25,
#   layer = "annual"
# )

Let’s try to plot this. Just plotting for the first time point and for the two models.

# getwd()
# models <- list.files(file.path(base_dir, "data", "proc", "regridded", "yearly", variable), full.names = TRUE)
# model1 <- terra::rast(models[1])
# plot(model1$tos_1)
# 
# model2 <- rast(models[2])
# plot(model2$tos_1)

Create ensemble

For now, we only have 2 models, but if you have more models, you just need to input a list of the model names as a vector in the model_list argument.

# htr_create_ensemble(
#   indir = file.path(base_dir, "data", "proc", "regridded", "yearly", variable), # input directory
#   outdir = file.path(base_dir, "data", "proc", "ensemble", "mean", variable), # output directory
#   model_list = c("FGOALS-f3-L", "CanESM5"), # list of models for ensemble
#   variable = variable, # variable name
#   freq = "Omon", # original frequency of data
#   scenario = "omip2", # scenario
#   mean = TRUE # if false, takes the median
# )

Then, we plot to see how it looks.

# ensemble_model <- list.files(file.path(base_dir, "data", "proc", "ensemble", "mean", variable), full.names = TRUE)
# ensemble <- rast(ensemble_model)
# plot(ensemble$tos_1)