Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actions to main #2

Merged
merged 8 commits into from
Apr 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 24 additions & 24 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
# Load R image
FROM rocker/r-ver:4.3.0
# DeGAUSS container metadata
ENV degauss_name="daymet"
ENV degauss_version="0.1.0"
ENV degauss_description="daymet climate variables"
ENV degauss_argument="short description of optional argument [default: 'insert_default_value_here']"
# add OCI labels based on environment variables too
LABEL "org.degauss.name"="${degauss_name}"
LABEL "org.degauss.version"="${degauss_version}"
LABEL "org.degauss.description"="${degauss_description}"
LABEL "org.degauss.argument"="${degauss_argument}"
# Load R image
FROM rocker/r-ver:4.3.0

# DeGAUSS container metadata
ENV degauss_name="daymet"
ENV degauss_version="0.1.1"
ENV degauss_description="daymet climate variables"
ENV degauss_argument="short description of optional argument [default: 'insert_default_value_here']"

# add OCI labels based on environment variables too
LABEL "org.degauss.name"="${degauss_name}"
LABEL "org.degauss.version"="${degauss_version}"
LABEL "org.degauss.description"="${degauss_description}"
LABEL "org.degauss.argument"="${degauss_argument}"

WORKDIR /app

RUN apt-get update -y
RUN apt-get install libxml2-dev zlib1g-dev libfontconfig1-dev libssl-dev libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libudunits2-dev cmake libnetcdf-dev libgdal-dev libgeos-dev libproj-dev libsqlite0-dev -y
# Install R dependencies
RUN R -e "install.packages(c('daymetr', 'tidyverse', 'terra', 'gtools', 'data.table', 'remotes', 'withr'))"
RUN R --quiet -e "remotes::install_github('degauss-org/dht')"
COPY entrypoint.R .
RUN apt-get install libxml2-dev zlib1g-dev libfontconfig1-dev libssl-dev libcurl4-openssl-dev libharfbuzz-dev libfribidi-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libudunits2-dev cmake libnetcdf-dev libgdal-dev libgeos-dev libproj-dev libsqlite0-dev -y

# Install R dependencies
RUN R -e "install.packages(c('daymetr', 'tidyverse', 'terra', 'gtools', 'data.table', 'remotes', 'withr'))"
RUN R --quiet -e "remotes::install_github('degauss-org/dht')"

COPY entrypoint.R .

WORKDIR /tmp

ENTRYPOINT ["/usr/local/bin/Rscript", "/app/entrypoint.R"]
ENTRYPOINT ["/usr/local/bin/Rscript", "/app/entrypoint.R"]

49 changes: 35 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,42 +3,63 @@
[![](https://img.shields.io/github/v/release/degauss-org/daymet?color=469FC2&label=version&sort=semver)](https://github.com/degauss-org/daymet/releases)
[![container build status](https://github.com/degauss-org/daymet/workflows/build-deploy-release/badge.svg)](https://github.com/degauss-org/daymet/actions/workflows/build-deploy-release.yaml)

## Background

Daymet weather variables include daily minimum and maximum temperature, precipitation, vapor pressure, shortwave radiation, snow water equivalent, and day length produced on a 1 km x 1 km gridded surface over continental North America and Hawaii from 1980 and over Puerto Rico from 1950 through the end of the most recent full calendar year.

Daymet data documentation: https://daac.ornl.gov/DAYMET/guides/Daymet_Daily_V4.html

Note: The Daymet calendar is based on a standard calendar year. All Daymet years, including leap years, have 1–365 days. For leap years, the Daymet data include leap day (February 29) and December 31 is discarded from leap years to maintain a 365-day year.

## Using

If `loyalty_degauss.csv` is a file in the current working directory with coordinate columns named `lat` and `lon`, then the [DeGAUSS command](https://degauss.org/using_degauss.html#DeGAUSS_Commands):
If `my_addresses.csv` is a file in the current working directory with ID column `id`, start and end date columns `start_date` and `end_date`, and coordinate columns named `lat` and `lon`, then the [DeGAUSS command](https://degauss.org/using_degauss.html#DeGAUSS_Commands):

```sh
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/daymet:0.1.0 loyalty_degauss.csv
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/daymet:0.1.1 loyalty_degauss.csv
```

will produce `loyalty_degauss_daymet_0.1.0.csv` with added columns:
will produce `my_addresses_daymet_0.1.1.csv` with added columns:

- **`tmax`**: maximum temperature
- **`tmin`**: minimum temperature
- **`srad`**: solar radiation
- **`srad`**: shortwave radiation
- **`vp`**: vapor pressure
- **`swe`**: snow water equivalent
- **`prcp`**: precipitation
- **`dayl`**: day length

Other columns may be present in the input `my_addresses.csv` file, and these other columns will be linked in and included in the output `my_addresses_daymet_0.1.1.csv` file.

### Optional Arguments

- Optional arguments include:
- **`vars`**: daymet variables (any of: <tmax, tmin, srad, vp, swe, prcp, dayl> separated by comma)
- **`min_lon`**: minimum longitude (numeric)
- **`max_lon`**: maximum longitude (numeric)
- **`min_lat`**: minimum_latitude (numeric)
- **`max_lat`**: maximum latitude (numeric)
- **`region`**: daymet region ('na' for North America, 'hi' for Hawaii, 'pr' for Puerto Rico)
- **`vars`**: Comma-separated string of Daymet variables: Any combination of "tmax,tmin,srad,vp,swe,prcp,dayl" (quotes are optional). Default is to download and link all Daymet variables.
- **`min_lon`**: Minimum longitude (in decimal degrees) of bounding box for Daymet data download. Default is to infer bounding box from address coordinates.
- **`max_lon`**: Maximum longitude (in decimal degrees) of bounding box for Daymet data download. Default is to infer bounding box from address coordinates.
- **`min_lat`**: Minimum latitude (in decimal degrees) of bounding box for Daymet data download. Default is to infer bounding box from address coordinates.
- **`max_lat`**: Maximum latitude (in decimal degrees) of bounding box for Daymet data download. Default is to infer bounding box from address coordinates.
- **`region`**: Daymet spatial region: "na" for continental North America, "hi" for Hawaii, or "pr" for Puerto Rico (quotes are optional). Default is continental North America.

An example DeGAUSS command with all optional arguments used would be:

```sh
docker run --rm -v $PWD:/tmp ghcr.io/degauss-org/daymet:0.1.1 my_addresses.csv tmax,vp,prcp -88.263390 -87.525706 41.470117 42.154247 na
```

which will return maximum temperature, vapor pressure, and precipitation for observations within a bounding box of Cook County, IL. It is important to specify bounding box coordinates in the order of: `min_lon`, `max_lon`, `min_lat`, `max_lat`.

## Geomarker Methods

- If needed, put details here about the methods and assumptions used in the geomarker assessment process.
Daymet data on a specified date is linked to coordinate data within the `my_addresses.csv` file by matching on the Daymet 1 km x 1 km raster cell number.

## Geomarker Data

- List how geomarker was created, ideally including any scripts within the repo used to do so or linking to an external repository
- If applicable, list where geomarker data is stored in S3 using a hyperlink like: [`s3://path/to/daymet.rds`](https://geomarker.s3.us-east-2.amazonaws.com/path/to/daymet.rds)
- Environmental data is downloaded from [Daymet](https://daymet.ornl.gov/) as netCDF file(s) using the [daymetr package](https://github.com/bluegreen-labs/daymetr).
- The R code that links the environmental data to the input coordinates is within `entrypoint.R`.

## Warning

If the bounding box for Daymet data download is inferred from address coordinates, then the size of the Daymet data download may be quite large if the address coordinates are very spread out. If a wide spread of coordinates is desired, then it may be best to stratify your input dataset to coordinates within separate geographic regions.

## DeGAUSS Details

Expand Down
58 changes: 29 additions & 29 deletions entrypoint.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,66 +21,66 @@ doc <- '
Options:
-h --help Show this screen
filename name of csv file
vars tmax, tmin, srad, vp, swe, prcp, dayl, capricorn, or none (see readme for more info)
vars tmax, tmin, srad, vp, swe, prcp, dayl, or capricorn (see readme for more info)
min_lon minimum longitude
max_lon maximum longitude
min_lat minimum latitude
max_lat maximum latitude
region daymet region
region daymet region
'
opt <- docopt::docopt(doc)

if (is.null(opt$vars)) {
opt$vars <- "tmax, tmin, srad, vp, swe, prcp, dayl"
cli::cli_alert_warning("Blank argument for Daymet variable selection. Will return all Daymet variables. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Blank argument for Daymet variable selection. Will return all Daymet variables. Please see {.url https://degauss.org/daymet/} for more information.")
}

day_var <- str_remove_all(opt$vars, " ")
day_var <- str_split(day_var, ",", simplify = TRUE)

if (! all(day_var %in% c("tmax", "tmin", "srad", "vp", "swe", "prcp", "dayl", "capricorn", "none"))) {
if (! all(day_var %in% c("tmax", "tmin", "srad", "vp", "swe", "prcp", "dayl", "capricorn"))) {
opt$vars <- "tmax, tmin, srad, vp, swe, prcp, dayl"
cli::cli_alert_warning("Invalid argument for Daymet variable selection. Will return all Daymet variables. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Invalid argument for Daymet variable selection. Will return all Daymet variables. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (is.null(opt$min_lon)) {
opt$min_lon <- 0
cli::cli_alert_warning("Blank argument for minimum longitude. Will use minimum longitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Blank argument for minimum longitude. Will use minimum longitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (is.null(opt$max_lon)) {
opt$max_lon <- 0
cli::cli_alert_warning("Blank argument for maximum longitude. Will use maximum longitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Blank argument for maximum longitude. Will use maximum longitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (is.null(opt$min_lat)) {
opt$min_lat <- 0
cli::cli_alert_warning("Blank argument for minimum latitude. Will use minimum latitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Blank argument for minimum latitude. Will use minimum latitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (is.null(opt$max_lat)) {
opt$max_lat <- 0
cli::cli_alert_warning("Blank argument for maximum latitude. Will use maximum latitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Blank argument for maximum latitude. Will use maximum latitude coordinates from address file. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (is.null(opt$region)) {
opt$region <- "na"
cli::cli_alert_warning("Blank argument for region. Will use North America as default. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Blank argument for region. Will use North America as default. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (! opt$region %in% c("na", "hi", "pr")) {
opt$region <- "na"
cli::cli_alert_warning("Invalid argument for Daymet region. Will use North America as default. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
cli::cli_alert_warning("Invalid argument for Daymet region. Will use North America as default. Please see {.url https://degauss.org/daymet/} for more information.")
}

if (opt$vars %in% c("capricorn")) {
opt$vars <- "tmax, tmin"
opt$min_lon <- -88.263390
opt$max_lon <- -87.525706
opt$min_lat <- 41.470117
opt$max_lat <- 42.154247
opt$region <- "na"
cli::cli_alert_warning("Returning tmax and tmin for lat/lon coordinates of Cook County. Please see {.url https://degauss.org/daymet/} for more information about the Daymet variable argument.")
opt$vars <- "tmax, tmin"
opt$min_lon <- -88.263390
opt$max_lon <- -87.525706
opt$min_lat <- 41.470117
opt$max_lat <- 42.154247
opt$region <- "na"
cli::cli_alert_warning("Returning tmax and tmin for lat/lon coordinates of Cook County. Please see {.url https://degauss.org/daymet/} for more information.")
}

# Writing functions
Expand Down Expand Up @@ -183,31 +183,31 @@ import_data <- function(.csv_filename = opt$filename, .min_lon = opt$min_lon, .m
print(w)
stop(call. = FALSE)
})
# Filtering out any rows in the input data where the start_date is before 1980 if region is "na" or "hi", or before 1950 if region is "pr"
# Inferring the start and end year of Daymet data to download from start_date and end_date
year_start <- year(min(input_data$start_date))
year_end <- year(max(input_data$end_date))
# Expanding the dates between start_date and end_date into a daily series
input_data <- expand_dates(input_data, by = "day") %>%
select(-start_date, -end_date)
# Filtering out any rows in the input data where the date is before 1980 if region is "na" or "hi", or before 1950 if region is "pr"
if (.region == "na" | .region == "hi") {
input_data <- input_data %>%
filter(!(start_date < as_date("1980-01-01")))
filter(!(date < as_date("1980-01-01")))
} else {
input_data <- input_data %>%
filter(!(start_date < as_date("1950-01-01")))
filter(!(date < as_date("1950-01-01")))
}
# Throwing an error if no observations are remaining
if (nrow(input_data) == 0) {
stop(call. = FALSE, 'Zero observations where the start_date is within or after the first year of available Daymet data.')
}
# Filtering out any rows in the input data where the end_date year is equal to the current date year
# Filtering out any rows in the input data where the date year is equal to the current date year
input_data <- input_data %>%
filter(!(year(end_date) == year(Sys.Date())))
filter(!(year(date) == year(Sys.Date())))
# Throwing an error if no observations are remaining
if (nrow(input_data) == 0) {
stop(call. = FALSE, 'Zero observations where the end_date is within or before the last year of available Daymet data.')
}
# Inferring the start and end year of Daymet data to download from start_date and end_date
year_start <- year(min(input_data$start_date))
year_end <- year(max(input_data$end_date))
# Expanding the dates between start_date and end_date into a daily series
input_data <- expand_dates(input_data, by = "day") %>%
select(-start_date, -end_date)
# Removing any columns in the input data where everything is NA
input_data <- input_data %>%
select_if(~ !all(is.na(.)))
Expand Down
Loading