Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic protocol, methods and standardization for VIP pipeline analysis #2558

Open
DuckflipXYZ opened this issue Dec 18, 2024 · 2 comments
Open
Assignees

Comments

@DuckflipXYZ
Copy link
Collaborator

Hello there !

A need about dataset analysis relative to VIP execution has been raised by Benoit and the CometeMoelle team. It includes multiple aspects :

  • Genericity of the analytics methods (for any kind of pipeline). It requires to have a discussion about the standardization of the potential future pipeline, particularly about outputs (for example, the requirement of an "error.yaml" or a "results.yaml" file returned to shanoir according to the VIP execution result in the context of CometeMoelle).

  • List of methods for pre/post execution analysis : what data are relevant for extraction and manipulation, how to gather and sort them, which ones are/can be generic among those, is there any need of a standardization again ? It can be different before and after executions (ex : check inputs before exec for estimations, check outputs after exec for feedbacs)

  • How to give a simple access and an easy use for those tools to the users/creators of pipelines.

Some pre/post executions methods are already shaped in the context of CometeMoelle, but there are clearly not exhaustive and kinda specific to CometeMoelle (structure is generic, but they need to be adapted when used in other context).

A whole discussion about what is already done, and what can/should be done for CometeMoelle/future pipelines is required.

@DuckflipXYZ
Copy link
Collaborator Author

DuckflipXYZ commented Jan 6, 2025

According to the 06/01/25 meeting, a POC for the Shanoir consortium based on the CometeMoelle needs seems relevant. So what has to be done is :

  • A REST API querying the database for data analysis extraction. This point needs to be specified (what data analysis are relevant to dev, from the already existing and maybe the not existing), or we keep what has already being done and we check later if there are missing analysis or useless developed analysis.
  • A Python library for simplifying data analysis execution (list of analysis + example of data analysis execution)

The 1st point might be discussed with CometeMoelle members.

@michaelkain
Copy link
Contributor

Hi @DuckflipXYZ Guewen,
thank you a lot for our very good meeting!
I think to return a CSV with the columns exam_id, processing_id, output.metadata.name (more?)
and to filter (input) with processing.comment like "pipelineIdentifier" might be a good starting point.
And send the examIds (input) in the post-body of the request.
With kind regards, Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants