-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Argo WF conformance class #386
base: master
Are you sure you want to change the base?
Conversation
add argo req class
added Argo
added argo
added argo
added argo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice to see more alternatives being implemented!
|
||
part:: If a process can be described for the intended use as a <<rc_argo,Argo graph>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the replacement process. | ||
|
||
part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as <<rc_ogcapppkg,Argo>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is application/argo
an official media-type? If not, the generic https://www.iana.org/assignments/media-types/application/vnd.oai.workflows+yaml with a contentSchema
with the Argo Workflow schema URL might be more appropriate.
An alternative would be to push Argo maintainers to publish a media-type like CWL did:
https://www.iana.org/assignments/media-types/application/cwl
https://www.iana.org/assignments/media-types/application/cwl+json
* `type` and `href` if passed by reference | ||
* `value` and `mediaType` if passed by value | ||
|
||
part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use distinct type values?
|
||
part:: The value of the `type` property shall be `application/argo`, when for `mediaType` it should be `application/argo+json`. | ||
|
||
part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"json" should be uppercase here
|
||
part:: The value of the `href` property shall be a reference to the Argo encoded file. The value of the `value` property shall be the Argo encoded in json format. | ||
|
||
part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be relevant to refer to a common parameter that can be reused across Workflow languages regardless of their specific implementation.
|
||
part:: If the Argo contains more than a single workflow identifier, an addition `w` query parameter may be used to target a specific workflow id to be deployed. | ||
|
||
part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "argo-worflow-not-exist". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to contradict the previous point that is worded in a way that w
is optimal, while required here.
part:: If a process can be represented for the intended use as a <<rc_argo,Argo Application>>, implementations should consider supporting the <<rc_argo,Argo>> encoding for describing the process to be deployed to the API. | ||
|
||
part:: The media type `application/argo` shall be used to indicate that request body contains a processes description encoded as a <<rc_argo,Argo Application>>. | ||
|
||
part:: If the Argo contains more than one workflow, an additional `w` query parameter may be used to reference the workflow id to be deployed. | ||
|
||
part:: The server should validate the Argo at the request time. In case, the server cannot find the `w` identifier within the workflow from the Argo provided, a 400 status code is expected with the type "worflow-not-found". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments to other file.
Update recommendations and add Requirements in the corresponding Requirements class Make CWL depending on OGC Application Package for not having to add another conformance class such as Define the Requirement for w param in DRU directly to make it easier to extent, cf. opengeospatial#386 Move workflow-not-found exception Requirement to DRU Requirements class
SWG telecon from 8th January 2024: We would like to see this tested e.g. in a testbed before adding this to the standard. |
SWG meeting from January 22nd: Move this to Part 3 project. |
Sorry I could not assist to today's meeting due to a conflict. |
@fmigneault Part 3 includes a "Deployable workflows" requirement class allowing to deploy a workflow as a process using Part 2, as well as dedicated requirement classes for specific workflow definition languages. Part 2 is about the generic idea that you can POST a process application package, regardless of what it contains. But if the content of that package is a workflow, this is more about Part 3 (working in conjunction with Part 2). We could also apply this to CWL, but due to the long-standing association with previous Part 2 efforts, Part 2 includes a CWL requirement class which is focused on the ability to use CWL for process description, rather than its ability to define workflows. There is still a CWL workflow requirement class in Part 3 about defining a workflow using CWL. |
@jerstlouis |
@fmigneault Sure, but that applies to all workflow definition languages, and I don't think that requires a req. class on Part 2 for that. (I think that was even the case for CWL, but there were strong arguments in favor of including it) |
From what I see on changed files, everything is relevant to Part 2, i.e.: a process description represented by Argo format and how to distinguish it from other workflow encodings to deploy/replace/undeploy it, to later execute it after deployment. I looked quickly at Part 3 Deployable Workflows and my impression is that it attempts to duplicate what Part 2 does, but with fewer details about the deployment itself (which makes sense since Part 3 focuses more on Execution). Because of the execution endpoint being used, it generates some issues about conflicting
IMO, it would make more sense that "Deployable Workflows" to be considered just another "workflow process graph" representation POSTed to The strength of Part 3 is about chaining multiple processes input/output/collections "on the fly" at execution time. If one intends to deploy the workflow rather than executing it directly, going through a Part 3 approach seems to over-complicate the Part 3 definition. Delegating "Deployable Workflows" to Part 2 with a specific "OGC Execution Workflow" would simplify how the two parts collaborate. |
The intent is not to duplicate anything, but to reference it normatively i.e., a workflow defined with Part 3, can be deployed using Part 2, for implementations declaring support for this requirement class, with a dependency on Part 2. But you are right that currently Deployable Workflow is more about the "OGC workflow definitions defined as an execution request". But it could be broadened to be about workflow in any process graph definition language (CWL, openEO, Argo...). The question really is just about where does the definition of that payload that get POSTed for definition languages belong. Because they define workflows, I think the consensus was that it belongs to Part 3. But of course the POST operation and the behavior is defined by Part 2. In the end, it doesn't really matter in which document the req. classes are defined, as long as they can work together. |
I think that because they define a workflow (which can be queried as described after deployment), and then be reused with other inputs without changing the process graph, it makes more sense to have them in Part 2. All the CWL, OpenEO and Argo graphs work under the assumption that the workflow steps are defined first, and then chains the submitted inputs. The OGC Part 3 Workflow could be implemented using any of those representations, but its real power comes from bridging data/process sources into an execution pipeline that does not need deployment, at the cost of being provided inline each time in the execution request. This is what makes it distinct from Part 2. If a Part 3 workflow was deployed, it could then be called like any other atomic process, regardless of the workflow engine under it. The workflow definition would be abstracted away. I am having discussions with other working groups, and the issue of handling multiple workflow formats and platform APIs often arises. I think it would be more useful for users if custom workflow encodings were deployed using Part 2 (as currently), while Part 3 limited itself to chaining standardized OGC API components. This way, Part 3 Workflows offer a truly interoperable way to call processes between servers. Otherwise, we somehow need to port OGC-native concepts such as |
The same is also true for the "Nested Processes" workflow defined in Part 3 an extension of Part 1 execution requests, they all work on existing OGC API - Processes either pre-existing for the implementation, or deployed using Part 2.
That is what the "Deployable Workflow" requirement class of Part 3 is about, leveraging part 2, f we make it agnostic of the workflow definition language (extended execution request, CWL, OpenEO, Argo...).
Whether things are defined in the Part 2 document or the Part 3 document should have zero impact on users. The functionality is exactly the same.
Part 3 defines several things, which may be contributing to confusion. "Collection Input" and "Collection Output" are really powerful concepts that bridges the data access OGC APIs as mechanisms, and is particularly relevant to the GeoDataCube API work. However, this "collection" functionality is fully orthogonal to the definition of process graphs in any particular workflow definition language, with the one exception that when using extended-Part 1 execution request, a "collection" property is used to specify a collection input. What I mean here is that even if you used CWL or Argo for your workflow definition, there could be a specific mechanism for how one can accept an OGC API - Coverages collection as an input to the workflow definition (using Coverages as an example, but could be Features, Tiles, DGGS, Maps, EDR...). And similarly, you could support creating a virtual collection as per Part 3-Collection Output, and trigger execution of the workflow for an area/time/resolution of interset as a result of an OGC API - Coverages request ("Collection Output").
This cost is mitigated by either deploying the workflow using Part 2 ("Deployable Workflow"), or by setting up a virtual collection ("Collection Output", with the possibility to set up a persistent public-facing collection that can optionally expose its internal workflow).
Currently, I believe the SWG is working under the assumption that anything to do with "workflow" belongs to Part 3. Of course Part 2 can be used to deploy both new processes that can be used within those workflows, and the workflows themselves as new processes (Part 3 - "Deployable Workflows"). The SWG could review whether more stuff should be included in Part 2, but I believe there is a preference to refrain from making too many chages to Part 2 so as to avoid delaying its completion. |
Exactly my point, therefore there is no need for CWL, openEO, Argo requirements classes in Part 3. It is redundant to have them there, as they should already be handled by Part 2.
Since they are not POSTed on the same endpoint, do not expect the same payload, and the result is not the same (whether the workflow is simply deployed or is executed immediately), it matters a lot. I agree with all points regarding how powerful Part 3 concepts could be, but at the same time, they lack explicit specification on how OGC concepts can be bridged with CWL, Argo, openEO, etc. There are already many long issue discussions not just by me that illustrate how non-trivial those assumptions do not just magically work together because each workflow technology has its own structure. Like you mentioned, Part 3 includes a lot of things. Adoption of these capabilities is only harder if we include Part 2 concepts in there as well. Since Part 3 already assumes that the processes it calls are Part 1 or Part 2 references, it makes more sense to reuse this abstraction.
I think this is only a side effect of Part 3 being called "Workflow" when it defines way more than that. Workflow concepts were present since at least OGC Best Practice for Earth Observation Application Package, which following initiative participants decided to ignore for whatever reason... |
SWG meeting from 2024-12-23: The structure of the documents of part 2 have changed, so this PR would need to adapt to the new structure. @christophenoel could you update the PR. Note that the first version of part 2 is more or less final, so we would consider the argo conformance class as future work. |
Over the years, our team has been gradually transitioning our implementation (including operational PDGS) to the Argo Workflow Language. This decision was made based on the Argo Workflow Language superior suitability for container-based workflows and modules, particularly when interacting with Kubernetes native environments. Additionally, the specification aligns well with the OpenAPI/JSON schemas that form the foundation of OGC API Processes.
To facilitate this transition, we have prepared a pull request that incorporates the essential requirements and recommendations for integrating the newly adopted conformance class into the existing spec. We sincerely request your consideration and integration of this profile.
(see email)