Generic protocol, methods and standardization for VIP pipeline analysis #2558

DuckflipXYZ · 2024-12-18T19:17:09Z

Hello there !

A need about dataset analysis relative to VIP execution has been raised by Benoit and the CometeMoelle team. It includes multiple aspects :

Genericity of the analytics methods (for any kind of pipeline). It requires to have a discussion about the standardization of the potential future pipeline, particularly about outputs (for example, the requirement of an "error.yaml" or a "results.yaml" file returned to shanoir according to the VIP execution result in the context of CometeMoelle).
List of methods for pre/post execution analysis : what data are relevant for extraction and manipulation, how to gather and sort them, which ones are/can be generic among those, is there any need of a standardization again ? It can be different before and after executions (ex : check inputs before exec for estimations, check outputs after exec for feedbacs)
How to give a simple access and an easy use for those tools to the users/creators of pipelines.

Some pre/post executions methods are already shaped in the context of CometeMoelle, but there are clearly not exhaustive and kinda specific to CometeMoelle (structure is generic, but they need to be adapted when used in other context).

A whole discussion about what is already done, and what can/should be done for CometeMoelle/future pipelines is required.

DuckflipXYZ · 2025-01-06T10:51:27Z

According to the 06/01/25 meeting, a POC for the Shanoir consortium based on the CometeMoelle needs seems relevant. So what has to be done is :

A REST API querying the database for data analysis extraction. This point needs to be specified (what data analysis are relevant to dev, from the already existing and maybe the not existing), or we keep what has already being done and we check later if there are missing analysis or useless developed analysis.
A Python library for simplifying data analysis execution (list of analysis + example of data analysis execution)

The 1st point might be discussed with CometeMoelle members.

michaelkain · 2025-01-07T16:07:01Z

Hi @DuckflipXYZ Guewen,
thank you a lot for our very good meeting!
I think to return a CSV with the columns exam_id, processing_id, output.metadata.name (more?)
and to filter (input) with processing.comment like "pipelineIdentifier" might be a good starting point.
And send the examIds (input) in the post-body of the request.
With kind regards, Michael

DuckflipXYZ assigned michaelkain Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic protocol, methods and standardization for VIP pipeline analysis #2558

Generic protocol, methods and standardization for VIP pipeline analysis #2558

DuckflipXYZ commented Dec 18, 2024

DuckflipXYZ commented Jan 6, 2025 •

edited by michaelkain

Loading

michaelkain commented Jan 7, 2025

Generic protocol, methods and standardization for VIP pipeline analysis #2558

Generic protocol, methods and standardization for VIP pipeline analysis #2558

Comments

DuckflipXYZ commented Dec 18, 2024

DuckflipXYZ commented Jan 6, 2025 • edited by michaelkain Loading

michaelkain commented Jan 7, 2025

DuckflipXYZ commented Jan 6, 2025 •

edited by michaelkain

Loading