Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable servers #312

Merged
merged 14 commits into from
Oct 2, 2023
15 changes: 5 additions & 10 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
Contributing
============

Contributions are welcome, and they are greatly appreciated! Every little bit
helps, and credit will always be given.
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Expand All @@ -26,21 +25,17 @@ If you are reporting a bug, please include:
Fix Bugs
~~~~~~~~

Look through the GitHub issues for bugs. Anything tagged with "bug" and "help
wanted" is open to whoever wants to implement it.
Look through the GitHub issues for bugs. Anything tagged with "bug" and "help wanted" is open to whoever wants to implement it.

Implement Features
~~~~~~~~~~~~~~~~~~

Look through the GitHub issues for features. Anything tagged with "enhancement"
and "help wanted" is open to whoever wants to implement it.
Look through the GitHub issues for features. Anything tagged with "enhancement" and "help wanted" is open to whoever wants to implement it.

Write Documentation
~~~~~~~~~~~~~~~~~~~

RavenPy could always use more documentation, whether as part of the
official RavenPy docs, in docstrings, or even on the web in blog posts,
articles, and such.
RavenPy could always use more documentation, whether as part of the official RavenPy docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback
~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -90,7 +85,7 @@ Ready to contribute? Here's how to set up `ravenpy` for local development.

$ flake8 ravenpy tests
$ black --check ravenpy tests
$ python setup.py test # or `pytest`
$ pytest tests
$ tox

To get flake8, black, and tox, just pip install them into your virtualenv.
Expand Down
10 changes: 9 additions & 1 deletion HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@ History

0.12.4 (unreleased)
-------------------
* In tests, set xclim' missing value option to ``skip``. As of xclim 0.45, missing value checks are applied to ``fit`` indicator, meaning that parameters will be set to None if missing values are found in the fitted time series. Wrap calls to ``fit`` with ``xclim.set_options(check_missing="skip")`` to reproduce the previous behavior of xclim.

Breaking changes
^^^^^^^^^^^^^^^^
* In tests, set `xclim`'s missing value option to ``skip``. As of `xclim` v0.45, missing value checks are applied to the ``fit`` indicator, meaning that parameters will be set to `None` if missing values are found in the fitted time series. Wrap calls to ``fit`` with ``xclim.set_options(check_missing="skip")`` to reproduce the previous behavior of xclim.
* `RavenPy` processes and tests that depend on remote THREDDS/GeoServer now allow for optional server URL and file location targets. These can be set with the following environment variables:
* `RAVENPY_THREDDS_URL`: URL to the THREDDS-hosted climate data service. Defaults to `https://pavics.ouranos.ca/twitcher/ows/proxy/thredds`.
Zeitsperre marked this conversation as resolved.
Show resolved Hide resolved
* `RAVENPY_GEOSERVER_URL`: URL to the GeoServer-hosted vector/raster data. Defaults to `https://pavics.ouranos.ca/geoserver`.
* This environment variable was previously called `GEO_URL` and was renamed to narrow its scope to RavenPy.
* The `_determine_upstream_ids` function under `ravenpy.utilities.geoserver` has been removed as it was a duplicate of `ravenpy.utilities.geo.determine_upstream_ids`. The latter function is now used in its place.

0.12.3 (2023-08-25)
-------------------
Expand Down
20 changes: 15 additions & 5 deletions docs/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ Installation
Anaconda Python Installation
----------------------------

For many reasons, we recommend using a `Conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_
to work with the full RavenPy installation. This implementation is able to manage the harder-to-install GIS dependencies, like `GDAL`.
For many reasons, we recommend using a `Conda environment <https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html>`_ to work with the full RavenPy installation. This implementation is able to manage the harder-to-install GIS dependencies, like `GDAL`.

Begin by creating an environment:

Expand All @@ -26,8 +25,7 @@ RavenPy can then be installed directly via its `conda-forge` package by running:

(ravenpy) $ conda install -c conda-forge ravenpy

This approach installs the `Raven <http://raven.uwaterloo.ca>`_ binary directly to your environment `PATH`,
as well as installs all the necessary Python and C libraries supporting GIS functionalities.
This approach installs the `Raven <http://raven.uwaterloo.ca>`_ binary directly to your environment `PATH`, as well as installs all the necessary Python and C libraries supporting GIS functionalities.

Python Installation (pip)
-------------------------
Expand Down Expand Up @@ -71,10 +69,22 @@ Once downloaded/compiled, the binary can be pointed to manually (as an absolute

$ export RAVENPY_RAVEN_BINARY_PATH=/path/to/my/custom/raven

Customizing remote service datasets
-----------------------------------

A number of functions and tests within `RavenPy` are dependent on remote services (THREDDS, GeoServer) for providing climate datasets, hydrological boundaries, and other data. These services are provided by `Ouranos <https://www.ouranos.ca>`_ through the `PAVICS <https://pavics.ouranos.ca>`_ project and may be subject to change in the future.

If for some reason you wish to use alternate services, you can set the following environment variables to point to your own instances of THREDDS and GeoServer:

.. code-block:: console

$ export RAVENPY_THREDDS_URL=https://my.domain.org/thredds
$ export RAVENPY_GEOSERVER_URL=https://my.domain.org/geoserver

Development Installation (from sources)
---------------------------------------

The sources for RavenPy can be obtained from the GitHub repo:
The sources for `RavenPy` can be obtained from the GitHub repo:

.. code-block:: console

Expand Down
80 changes: 69 additions & 11 deletions ravenpy/extractors/forecasts.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import datetime as dt
import logging
import os
import re
import warnings
from pathlib import Path
from typing import Any, List, Tuple, Union
from urllib.parse import urljoin

import pandas as pd
import xarray as xr
Expand All @@ -19,6 +22,13 @@

LOGGER = logging.getLogger("PYWPS")

# Can be set at runtime with `$ env RAVENPY_THREDDS_URL=https://xx.yy.zz/geoserver/ ...`.
THREDDS_URL = os.environ.get(
"RAVENPY_THREDDS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
)
if not THREDDS_URL.endswith("/"):
THREDDS_URL = f"{THREDDS_URL}/"


def get_hindcast_day(region_coll: fiona.Collection, date, climate_model="GEPS"):
"""Generate a forecast dataset that can be used to run raven.
Expand All @@ -38,15 +48,41 @@ def get_hindcast_day(region_coll: fiona.Collection, date, climate_model="GEPS"):


def get_CASPAR_dataset(
climate_model: str, date: dt.datetime
climate_model: str,
date: dt.datetime,
thredds: str = THREDDS_URL,
directory: str = "dodsC/birdhouse/disk2/caspar/daily/",
) -> Tuple[
xr.Dataset, List[Union[Union[DatetimeIndex, Series, Timestamp, Timestamp], Any]]
]:
"""Return CASPAR dataset."""
"""Return CASPAR dataset.

Parameters
----------
climate_model : str
Type of climate model, for now only "GEPS" is supported.
date : dt.datetime
The date of the forecast.
thredds : str
The thredds server url. Default: "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
directory : str
The directory on the thredds server where the data is stored. Default: "dodsC/birdhouse/disk2/caspar/daily/"

Returns
-------
xr.Dataset
The forecast dataset.
"""
if thredds[-1] != "/":
warnings.warn(
"The thredds url should end with a slash. Appending it to the url."
)
thredds = f"{thredds}/"

if climate_model == "GEPS":
d = dt.datetime.strftime(date, "%Y%m%d")
file_url = f"https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/disk2/caspar/daily/GEPS_{d}.nc"
Zeitsperre marked this conversation as resolved.
Show resolved Hide resolved
file_location = urljoin(directory, f"GEPS_{d}.nc")
file_url = urljoin(thredds, file_location)
ds = xr.open_dataset(file_url)
# Here we also extract the times at 6-hour intervals as Raven must have
# constant timesteps and GEPS goes to 6 hours
Expand All @@ -66,14 +102,37 @@ def get_CASPAR_dataset(

def get_ECCC_dataset(
climate_model: str,
thredds: str = THREDDS_URL,
directory: str = "dodsC/datasets/forecasts/eccc_geps/",
) -> Tuple[
Dataset, List[Union[Union[DatetimeIndex, Series, Timestamp, Timestamp], Any]]
]:
"""Return latest GEPS forecast dataset."""
"""Return latest GEPS forecast dataset.

Parameters
----------
climate_model : str
Type of climate model, for now only "GEPS" is supported.
thredds : str
The thredds server url. Default: "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
directory : str
The directory on the thredds server where the data is stored. Default: "dodsC/datasets/forecasts/eccc_geps/"

Returns
-------
xr.Dataset
The forecast dataset.
"""
if thredds[-1] != "/":
warnings.warn(
"The thredds url should end with a slash. Appending it to the url."
)
thredds = f"{thredds}/"

if climate_model == "GEPS":
# Eventually the file will find a permanent home, until then let's use the test folder.
file_url = "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/forecasts/eccc_geps/GEPS_latest.ncml"

file_location = urljoin(directory, "GEPS_latest.ncml")
file_url = urljoin(thredds, file_location)
ds = xr.open_dataset(file_url)
# Here we also extract the times at 6-hour intervals as Raven must have
# constant timesteps and GEPS goes to 6 hours
Expand Down Expand Up @@ -130,9 +189,10 @@ def get_subsetted_forecast(
times: Union[dt.datetime, xr.DataArray],
is_caspar: bool,
) -> xr.Dataset:
"""
"""Get Subsetted Forecast.

This function takes a dataset, a region and the time sampling array and returns
the subsetted values for the given region and times
the subsetted values for the given region and times.

Parameters
----------
Expand All @@ -143,14 +203,12 @@ def get_subsetted_forecast(
times : dt.datetime or xr.DataArray
The array of times required to do the forecast.
is_caspar : bool
True if the data comes from Caspar, false otherwise.
Used to define lat/lon on rotated grid.
True if the data comes from Caspar, false otherwise. Used to define lat/lon on rotated grid.

Returns
-------
xr.Dataset
The forecast dataset.

"""
# Extract the bounding box to subset the entire forecast grid to something
# more manageable
Expand Down
33 changes: 26 additions & 7 deletions ravenpy/utilities/forecasting.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@
"""
import datetime as dt
import logging
import os
import tempfile
import warnings
from pathlib import Path
from typing import List, Union
from urllib.parse import urlparse

import climpred
import xarray as xr
Expand All @@ -20,6 +23,13 @@

LOGGER = logging.getLogger("PYWPS")

# Can be set at runtime with `$ env RAVENPY_THREDDS_URL=https://xx.yy.zz/thredds/ ...`.
THREDDS_URL = os.environ.get(
"RAVENPY_THREDDS_URL", "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"
)
if not THREDDS_URL.endswith("/"):
THREDDS_URL = f"{THREDDS_URL}/"


def climatology_esp(
config,
Expand Down Expand Up @@ -391,9 +401,10 @@ def ensemble_prediction(
hindcast_from_meteo_forecast = ensemble_prediction


def compute_forecast_flood_risk(forecast: xr.Dataset, flood_level: float):
"""Returns the empirical exceedance probability for each forecast day based
on a flood level threshold.
def compute_forecast_flood_risk(
forecast: xr.Dataset, flood_level: float, thredds: str = THREDDS_URL
) -> xr.Dataset:
"""Returns the empirical exceedance probability for each forecast day based on a flood level threshold.

Parameters
----------
Expand All @@ -402,12 +413,19 @@ def compute_forecast_flood_risk(forecast: xr.Dataset, flood_level: float):
flood_level : float
Flood level threshold. Will be used to determine if forecasts exceed
this specified flood threshold. Should be in the same units as the forecasted streamflow.
thredds : str
The thredds server url. Default: "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/"

Returns
-------
xr.Dataset
Time series of probabilities of flood level exceedance.
"""
if thredds[-1] != "/":
warnings.warn(
"The thredds url should end with a slash. Appending it to the url."
)
thredds = f"{thredds}/"

# ---- Calculations ---- #
# Ensemble: for each day, calculate the percentage of members that are above the threshold
Expand All @@ -429,12 +447,13 @@ def compute_forecast_flood_risk(forecast: xr.Dataset, flood_level: float):
forecast.where(forecast > flood_level).notnull() / 1.0
) # This is needed to return values instead of floats

domain = urlparse(thredds).netloc

out = pct.to_dataset(name="exceedance_probability")
out.attrs["source"] = "PAVICS-Hydro flood risk forecasting tool, pavics.ouranos.ca"
out.attrs["source"] = f"PAVICS-Hydro flood risk forecasting tool, {domain}"
out.attrs["history"] = (
"File created on "
+ dt.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
+ "UTC on the PAVICS-Hydro service available at pavics.ouranos.ca"
f"File created on {dt.datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')} "
f"UTC on the PAVICS-Hydro service available at {domain}."
)
out.attrs[
"title"
Expand Down
Loading