Skip to content

Commit

Permalink
New setup with jupyter-book (#12)
Browse files Browse the repository at this point in the history
* Initializing pixi

* Initializing a new standard book

* Updating config variables

* Updating pixi

* Adding matplotlib

* Adding tests from ASCIIDoc

* Adding realtimeqc from ASCIIDoc

* cfg: Ignoring _build

* fix: Path to logo

* fix: Headers with MD syntax

* Replacing template bibliography

* Updating with new bibliography style

* cfg: Internal labels and links

* fix: Updating functions to new syntax

* feat: Adding logo image

* Including my sections into TOC

* Prototype of deploy action

* Using pixi to deploy

* clean: Removing demo files from tutorial

* cfg: Updating pixi.lock

* cfg: Numerated chapters/sections

* fix: Updating syntax for comments

* fix: Typo on Victor's email

* clean: Bibliography was already moved

* Updating actions/checkout

* cfg, fix: identation

* cfg: Updating upload-pages-artifact

* cfg: For now, trigger manually only

* cfg: Renaming workflow to deploy

* clean: Removing template files
  • Loading branch information
castelao authored Nov 3, 2024
1 parent 34db336 commit 88fafdb
Show file tree
Hide file tree
Showing 13 changed files with 8,135 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# GitHub syntax highlighting
pixi.lock linguist-language=YAML linguist-generated=true
57 changes: 57 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
name: deploy-book

on:
workflow_dispatch:

jobs:
deploy-book:
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup pixi
uses: prefix-dev/setup-pixi@v0.8.1
with:
pixi-version: v0.34.0
cache: true
frozen: true

# Install dependencies
- name: Set up Python 3.11
uses: actions/setup-python@v3
with:
python-version: 3.11

- name: Install dependencies
run: |
pip install -r requirements.txt
# (optional) Cache your executed notebooks between runs
# if you have config:
# execute:
# execute_notebooks: cache
- name: cache executed notebooks
uses: actions/cache@v3
with:
path: _build/.jupyter_cache
key: jupyter-book-cache-${{ hashFiles('requirements.txt') }}

# Build the book
- name: Build the book
run: |
jupyter-book build .
# Upload the book's HTML as an artifact
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: "_build/html"

# Deploy the book's HTML to GitHub Pages
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v2
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
_build

# pixi environments
.pixi
*.egg-info
7,828 changes: 7,828 additions & 0 deletions pixi.lock

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions pixi.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[project]
name = "jupyterbook"
version = "0.1.0"
description = "Realtime quality control of underwater gliders"
authors = ["Gui Castelao <guilherme@castelao.net>"]
channels = ["conda-forge", "conda", "main"]
platforms = ["osx-arm64", "linux-64", "win-64"]

[tasks]

[dependencies]
jupyter-book = ">=1.0.3,<2"
matplotlib = ">=3.9.2,<4"
32 changes: 32 additions & 0 deletions rtqc/_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Book settings
# Learn more at https://jupyterbook.org/customize/config.html

title: Realtime Quality Control for Underwater Gliders
author: OceanGliders RTQC team
logo: images/logo-ocean-gliders.png

# Force re-execution of notebooks on each build.
# See https://jupyterbook.org/content/execute.html
execute:
execute_notebooks: force

# Define the name of the latex output file for PDF builds
latex:
latex_documents:
targetname: book.tex

# Add a bibtex file so that we can create citations
bibtex_bibfiles:
- references.bib

# Information about where the book exists on the web
repository:
url: https://github.com/OceanGlidersCommunity/Realtime-QC
path_to_book: docs # Optional path to your book, relative to the repository root
branch: main

# Add GitHub buttons to your book
# See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository
html:
use_issues_button: true
use_repository_button: true
11 changes: 11 additions & 0 deletions rtqc/_toc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Table of contents
# Learn more at https://jupyterbook.org/customize/toc.html

format: jb-book
root: intro
options:
numbered: true
chapters:
- file: realtimeqc
- file: qc_tests
- file: references
Binary file added rtqc/images/logo-ocean-gliders.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 11 additions & 0 deletions rtqc/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Welcome to your Jupyter Book

This is a small sample book to give you a feel for how book content is
structured.
It shows off a few of the major file types, as well as some sample content.
It does not go in-depth into any particular topic - check out [the Jupyter Book documentation](https://jupyterbook.org) for more information.

Check out the content pages bundled with this sample book to see more.

```{tableofcontents}
```
45 changes: 45 additions & 0 deletions rtqc/qc_tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
---

# Tests

## Global Range

## Gradient

```{math}
:label: gradient
y_i = \left| x_i - \frac{x_{i+1} + x_{i-1}}{2} \right|
```

## Spike

The spike check was first proposed in {cite}`GTSPP:1990` and defined as:

```{math}
:label: spike
y_i = \left| x_i - \frac{x_{i+1} + x_{i-1}}{2} \right| - \left| \frac{x_{i-1} - x_{i+1}}{2} \right|,
```

where $i$ is the time index, i.e. it assumed the data $x$ was sorted by time.
This check is largely used without any modification.

## Digit Rollover

```{math}
:label: digit_rollover
y_i = x_i - x_{i-1}
```

## Spike - Median

Proposed by {cite}`johnson2021bgc`

```{math}
:label: spike-median
y_i = \left| x_i - median(x_{i-2}, x_{i-1}, x_i, x_{i+1}, x_{i+2}) \right|
```
95 changes: 95 additions & 0 deletions rtqc/realtimeqc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Realtime Quality Control for Underwater Gliders

## Credits

(List of contributors including brief description of the contribution(s))

[List of contributors](section-contributors)

## Statement of Purpose

## Inbox - Content to be processed

Here I'll add texts from our discussions that should be processed and organized
in this new document. Mostly copy-n-paste so we don't loose anything.

Topics that reached a consensus

### What defines real-time QC?

## Main Document

<!-- Underwater gliders only -->
This document is focused on underwater gliders, which use changes in buoyancy as the main propulsion, thus characterized by relatively slow propagation velocities.
Other types of platforms with different natures of operation, such as wave gliders and propelled underwater autonomous vehicles, might take advantage of this document but there is no intention on covering specific aspects of those other platforms.

<!-- Why should we do RTQC? -->
There are a wide range of applications that cannot afford waitting for the delayed mode product due to time constraints, such as data assimilation for weather and sea state forecasts, thus requiring a real-time quality control with low latency. Given the diverse environments where gliders are operated and differences among glider models, the operators themselves are the best suited to evaluate their own measurements. Despite that, different users might have different priorities as well as tolerance for what they consider as useful data. While for data assimilation one might unforgive bad measurements and rather use less data with higher quality, a monitoring or alerting system cannot afford wrongfully flaggin and miss a single extreme event. Although there is no one unique optimal flagging for everyone, communication with the final users and understanding of the expectations can help fine tunning the QC criteria and desired informative flags. It should be expected that some users will apply their own QC procedure in addition to the QC from the operators, but all it takes is one obviously bad data point to damage the credibility of the whole dataset. Even before the final users, the glider operation itlsef benefits from a continuous real-time QC. A prompt detection of sensor degradation might trigger an early recovery and swap sensors or the full platform when possible. Catastrophic failures are sometimes preceded by anomalous behaviour, thus high rate of errors should raise the alertness of the pilots. Some spurious measurements are inevitable for any ocean observing system. Despite the constraints imposed by telemetry and pressure for low latency, real-time QC has clear benefits and should be part of all glider operation.

<!-- Do not modify the original data -->
The methods described here augment the original data to guide the user on the decision of what to trust and hence what to use. The raw data must not be modified, but instead classification quality flags should be aggregated, therefore, allowing users to apply alternative quality control methods without limitations. For instance, advanced methods might be developed in the future, requiring the original data to avoid error propagation. Quality control in this document does not consider calibration but is limited to a classification. Whenever it is considered important to provide the best data possible ready to use, the recommendation is to aggregate such corrected data while preserving the original values directly available.

<!-- Do not limit to automatic procedures -->
Real-time QC is commonly associated with automatic checks, which indeed better suit the fast response requirements for a real-time data stream, but it should not be limited to automatic procedures. One of the distinct characteristics of underwater gliders is the close monitoring from pilots and scientists as frequently as hourly to at most every couple of days.
Complex cases, otherwise missed by the automatic procedures, but identified by the operators should be flagged and updated in the data stream. With the assumption of a low rate of mistakes by the automatic procedure, it is best to do not hold the automatic procedure for manual confirmation, but employ a fast automatic assessment and stream a posterior correction when necessary.

include::tests.adoc[leveloffset=+1]

## Procedures

#### Argo BGC

9. Spike test (median)

Red Sea and Mediterranean Sea, flag 4 if test value > 1 micromole/kg

Other places, flag 4 if test value > 5 micromole/kg

## History

Virtual synchronous meeting on XXX.
(add participants/contributors on each step.

### Virtual workshop

Between May 11 - 15 2021

(I think we had one single sync meeting. Find the correct date/time, participants, and notes taken)

### Async discussions from Apr 10 2021 to July 19 2021
Title Best Practices on Realtime QC

### Called: Main document

Contacted people by email.

Asynchronous discussions through a Google Docs initiated on Feb 2021 and contributions extended up to Oct 2021.

A Google Doc was choosen at that time to reduce the technical barrier and expand
the participation and contributions.

Participants:

* Guilherme Castelao (Gui), castelao@ucsd.edu, Scripps Institution of Oceanography
* Soeren Thomsen, soeren.thomsen@locean.ipsl.fr, LOCEAN, OceanGliders.org Best Practice coordinator
* Mark Bushnell, mark.bushnell@noaa.gov
* Pierre Testor, pierre.testor@locean.ipsl.fr
* Emma Slater, emmer@bodc.ac.uk
* Justin Buck, juck@bodc.ac.uk
* Mun Woo, mun.woo@uwa.edu.au, IMOS Ocean Gliders
* Thierry Carval, thierry.carval@ifremer.fr, IFREMER
* Corentin Guyot, corentin.guyot@ifremer.fr, IFREMER
* Sylvie Pouliquen, Sylvie.Pouliquen@ifremer.fr, IFREMER and Euro*Argo ERIC
* Victor Turpin, vturpin@oceanops.org, OceanOPS
* Nikolaos Zarokanellos, nzarokanellos@socib.es, SOCIB
* Christoph Waldmann, waldmann@marum.de, MARUM
* John Kerfoot, kerfoot@marine.rutgers.edu
* Claire Gourcuff, claire.gourcuff@euro-argo.eu

(section-contributors)=
## Contributors

List of all contributors an their contributions.

* Schmechtig (@catsch), CNRS
PR-1: Details on Argo's Nitrate test
33 changes: 33 additions & 0 deletions rtqc/references.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
---
@manual{GTSPP:1990,
title = "GTSPP Real-Time Quality Control Manual",
organization = {UNESCO--IOC},
address ="United Nations Educational, Scientific and Cultural Organization 7, Place de Fontenoy,
75352, {P}aris 07 SP",
year = "1990",
note = "SC-90/WS-74",
series = "Intergovernmental Oceanographic Commission Manuals and Guides; 22",
key = "{UNESCO--IOC}, 1990",
}

@manual{GTSPP:2010,
title = "GTSPP Real-Time Quality Control Manual",
organization = {UNESCO--IOC},
address ="United Nations Educational, Scientific and Cultural Organization 7, Place de Fontenoy,
75352, {P}aris 07 SP",
edition = "First Revised Edition",
year = "2010",
note = "IOC/2010/MG/22Rev.",
key = "{UNESCO--IOC}, 2010",
doi = "10.25607/OBP-1425",
}

@manual{johnson2021bgc,
title={BGC-Argo quality control manual for nitrate concentration},
author={Johnson, Kenneth and Maurer, Tanya and Plant, Joshua and Bittig, Henry and Schallenberg, Christina and Schmechtig, Catherine},
year={2021},
edition={Version 1.0},
doi={10.13155/84370}
}
3 changes: 3 additions & 0 deletions rtqc/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
jupyter-book
matplotlib
numpy

0 comments on commit 88fafdb

Please sign in to comment.