Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add findability for Galaxy #2885

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

Marie59
Copy link
Contributor

@Marie59 Marie59 commented Jan 14, 2025

1st draft of a findability page (like the interoperability here https://galaxyproject.org/fair/interoperability/) for Galaxy.

I kept pretty much the same beginning than for interoperability to present Galaxy.
There is for sure many things to review and probably things to add on findability in Galaxy that I did not think of.
Still need to be done:

  • add some logos
  • pictures
  • some links (for RO-Crate, workflowhub, and all)
  • What else ?

PS: If we get bored during winter school @bgruening we can start accessibility and reusability ;)


### Persistent Identifiers (PIDs) for Tools and Workflows

Galaxy supports the assignment of Persistent Identifiers (PIDs) to tools, workflows, and datasets. PIDs ensure that research outputs remain discoverable and accessible over time, even as technologies and platforms evolve. This practice eliminates the risk of losing resources due to broken links or outdated references, making Galaxy a reliable environment for finding scientific artifacts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can mention here tools-id and the toolshed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the tools-id the PIDs of tools ?


### Searchable Repositories

Galaxy provides access to searchable repositories such as the Galaxy Toolshed, a centralized platform that indexes and hosts tools and workflows shared by the global Galaxy community. Researchers can easily browse, search, and discover tools and workflows tailored to their needs, fostering collaboration and knowledge sharing. The Galaxy Toolshed functions as an "app store" for tools, ensuring that resources are well-organized and accessible to users across disciplines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • toolshed is not for Workflows
  • I would write here about file-sources, zenodo as a prominent example

@Marie59
Copy link
Contributor Author

Marie59 commented Jan 15, 2025

ping @dadrasarmin

@dadrasarmin
Copy link
Contributor

I have a comment on spelling which is not important but it caught my eyes. We mix the English and American way of spelling in the document. Things like:

  • emphasises, visualisation, organisation, ...
  • organized, standardized, centralized, ...

As a reader, I prefer it to be one way (American) but I think consistency is nice.

@Marie59
Copy link
Contributor Author

Marie59 commented Jan 15, 2025

I have a comment on spelling which is not important but it caught my eyes. We mix the English and American way of spelling in the document. Things like:

* emphasises, visualisation, organisation, ...

* organized, standardized, centralized, ...

As a reader, I prefer it to be one way (American) but I think consistency is nice.

I never can remember which is which and what's the difference between British and American ;)

@bgruening bgruening marked this pull request as draft January 15, 2025 15:39
@dadrasarmin
Copy link
Contributor

I do not know how to copy markdown in google doc and keeping the table. :)
I think this page is designed for a public audience and I consider myself one of them, since I do not have any education on the topic. I found following the flow of text in the google doc difficult and I could not connect the dots. I made the following for myself, and maybe it is useful to form a draft for each page.

Acronym What it means Which Galaxy/EOSC/ELIXIR component answer this?
F1 (meta)data are assigned a globally unique and persistent identifier 1. Workflows from the Workflow Hub or Dockstore have a unique ID (is it PID?)

2. Tools have a unique ID on useGalaxy

3. Data Repository Service (DRS) is supported.

4. Internally a galaxy instance has a unique indentifiers for everything.
F2 data are described with rich metadata (defined by R1 below) 1. Histories, data, visualization, and workflows are richly described by predefined metadata.
F3 metadata clearly and explicitly include the identifier of the data it describes 1. Tools, datasets, and jobs have clear and explicit metadata in useGalaxy.
F4 (meta)data are registered or indexed in a searchable resource 1. Registered workflows are in WfH, Dockstore, and IWC.

2. Tools,datasets, training material, and workflows can be searched using useGalaxy, GTN, and WorkflowHub.

3. EDAM annotation used for grouping and finding Armin has no clue about this item.
A1 (meta)data are retrievable by their identifier using a standardized communications protocol 1. useGalaxy provides many tools to access and retrieve specific data repositories via API, e.g. Copernicus, ENA, NCBI, UniProt, Ensembl, BioMart, AquaINFRA

2. useGalaxy provides a tool to deposit Sequencing data and metadata to European Nucleic Acid (ENA) repository.

3. Galaxy file-sources (global and user-based) can access all standard data repositories including Zenodo, Invenio, Datavers (soon), Nextcloud, Opencloud, S3, FTP, webdav, iRODs, OneData.

4. useGalaxy provides the ability to export the data, results, history, and workflows to any writable file-source, e.g. FTP, S, RO-Crate or BioComputeObjects.

5. Galaxy as an API - based on the open standard OpenAPI.

6. Galaxy supports Conda, Docker, Singularity and other systems to resolve dependencies which provides a searchable resource to setup computing environment as well as tied integration with Biocontainers.
A1.1 the protocol is open, free, and universally implementable 1. useGalaxy, GTN, and galaxycommunity are all open source by design.

2. Training material provided by GTN are accessible, free, and independent of useGalaxy or WorkflowHub.

3. Galaxy can offer 1000 of tools, that not only can be accessed via various Graphical interfaces but also via an API.

4. Galaxy provides infrastructure for Interactive Tools (RStudio, Jupyter …)
A1.2 the protocol allows for an authentication and authorization procedure, where necessary 1. everyone can create an account on public Galaxy servers, this includes common Identity providers (AAI) but also simple user-password accounts are possible.

2. EGI Check-in: Security is enhanced through federated identity management, such as EGI Check-in, which facilitates secure access for users worldwide.

3. Bring Your Own Storage (BYOS) and Bring Your Own Compute (BYOC) provides the infrastructure for authentication with public and private storage and computation resources. Galaxy object stores can be used to provide more storage to users, either funded by the deployer of the instance (e.g. de.NBI-cloud for EU) or by user-owned object stores that can be integrated into Galaxy (e.g. AWS, GCS, S3, iRODS …)
A2 metadata are accessible, even when the data are no longer available 1. The history of useGalaxy keeps the metadata for runs even after the deletion of runs.
I1 (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. 1. WorkflowHub offers standardized access to workflows.

2.
I2 (meta)data use vocabularies that follow FAIR principles 1. RO-Crate standardized the capturing of data and medata using a JSON format.
I3 (meta)data include qualified references to other (meta)data 1. useGalaxy metadata contains references to datasets, databases, tools, histories, and workflows.
R1 meta(data) are richly described with a plurality of accurate and relevant attributes 1.
R1.1 (meta)data are released with a clear and accessible data usage license 1. It is possible to share histories, data, visualization, and workflows with a specific user, groups, or globally.
R1.2 (meta)data are associated with detailed provenance 1. Every setting for all tools, including tool versions and containers are capture and can be used to rerun a tool or a workflow.

2. The metadata can be exported in standards like RO-Crate or BioComputeObjects.
R1.3 (meta)data meet domain-relevant community standards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants