Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rove for QC in the ingestor #42

Open
wants to merge 18 commits into
base: trunk
Choose a base branch
from
Open

Use rove for QC in the ingestor #42

wants to merge 18 commits into from

Conversation

intarga
Copy link
Member

@intarga intarga commented Dec 11, 2024

  • Schema changes to support flags
  • Plumbing
  • qc_data implementation
  • identify time_resolution from type_id
  • load and use real pipeline
  • Make sure mock pipeline is consistent with the real one
  • Merge confident.flags into public.data

This PR integrates rove to perform QC in the ingestor with the current limitations:

  • Only fresh pipelines are supported (not periodic or consistency)
  • We only have 1 actual pipeline (hourly air temperature), and even that has 2 checks disabled as we don't have access to model data
  • No caching of prior ingested observations

I considered verifying the QC output in the integration tests, but decided this was best left for when we implement an API that actually uses them (frost backend?) for now I manually queried the DB after running the integration tests to confirm qc_usable was as expected.

I restructured the generic Datum and DataChunk to match their obsinn equivalents. The original idea behind the different structure was that I thought they should be neutral towards different data sources that might structure messages differently, but I realised that the Obsinn structure of clustering by station and timestamp is what makes sense for any source of fresh data, so we should optimise for that.

There is a smaller PR of associated fixes in ROVE that should be merged before this one

@intarga intarga added the enhancement New feature or request label Dec 12, 2024
@intarga intarga force-pushed the rove_integration branch 2 times, most recently from edf95f6 to 389f56a Compare December 13, 2024 17:05
Upon further consideration, I think matching obsinn's format like this
makes sense. Since we are ingesting live data from weather stations,
it's reasonable to expect a station will send observations fromall its
params at once with the same timestamp, so we should optimise for this
use case
@@ -27,8 +27,10 @@ quick-xml = { version = "0.35.0", features = [ "serialize", "overlapped-lists" ]
rand = "0.8.5"
rand_distr = "0.4.3"
regex = "1.11.1"
rove = { git = "https://github.com/metno/rove.git" }
rove = { git = "https://github.com/metno/rove.git", branch = "lard_fixes" }
# rove = { git = "https://github.com/metno/rove.git" }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be removed before merging

in the future we probably want to have directories like qc_pipelines/periodic
and qc_pipelines/consistency, but for now we don't handle that so we just
load qc_pipelines/fresh directly
@intarga intarga marked this pull request as ready for review January 17, 2025 15:16
@intarga intarga linked an issue Jan 17, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rove integration
1 participant