SCOFF (Semantic Clustering Of Functional Fragments) automatically annotates workflows with semantic terms and returns subworkflows - clusters of semantically similar bioinformatics workflow fragments to promote workflow repair and construction.
This site describes our output result structure, includes the source code of the SCOFF system and briefly explains how to use it.
This section describes the different types of output files from our analysis of myExperiment workflows using SCOFF.
The output resources are distributed from the following URL, and folder structure( :
- abstract_workflows/: Partially abstracted workflows (i.e. after removing non-biologically-meaningful steps) in Taverna formats. They are the input of the automatic annotation step.
- OPMW/: Semantic annotations of all the services within every bioinformatics-oriented myExperiment workflow (in the standard OPMW format).
These RDF output files include an instance of theopmw:WorflowTemplateProcess
model for each non-shim available service of each annotated bioinformatics workflow (when each annotation in represented as an<rdf:type rdf:resource=URI/>
property. Using these RDF file, our annotations are available and could be easily integrated in other systems requiring structured annotations of bioinformatic services. - SCOFF/:
Multiple sets of workflow fragments grouped by semantic similarity in their annotations, based on 13 different ontologies and structure vocabularies (BAO, BRO, EDAM, EFO, IAO, MeSH, MS, NCIT, NIFSTD, OBI, OBIWS, SIO and SWO) from BioPortal and 4 clustering criteria (KbestSilh_AGNES, KbestSilh_PAM, KruleThumb_AGNES and KruleThumb_PAM
). These subdirectories include searchable text summary files (annotations_statistics_<ontID>.txt
and cluster_statistics.txt
) and visual representations of the clustered workflow fragments in .svg format (wf_myExperiment_<ontID>_<wfID>.<fragmentID>.svg
) to allow simple seearch for fragments related to desired term/s, either for new workflow creation or to repair a broken workflow.
A brief description and instructions for how and in which order to run the different SCOFF scripts. More details are available in the code comments of each script.
# 1.1.- Download workflows related to bioinformatics, assisted by the text mining Peregrine tool (
perl "<output directory (where saving workflow definition files)>" "[<additional terms filtering bioinformatics workflows>]"
# e.g. perl "../Data/WF_myExperiment" additionalBioinfoTerms.txt
# 1.2.- Clean 'shims' services and annotate
perl "<in/output directory (with workflows)>"
# e.g. perl "../Data/WF_myExperiment"
# 1.3.- To remove redundant annotations
perl "<dirInAnnotations>"
# e.g. perl "$HOME/Data/WF_myExperiment"
# 1.4.- Generate ttl and xml OPMW files
perl "<dirInWithoutShimsWf>" "<dirInOutAnnotations>" "[<dirInRedundantAnnotations>]" "[<pathTemplate_pairsURIannot-ICvalueFiles>]" "[[<TestWfID>]]"
# e.g. perl "../Data/WF_myExperiment" "../Data/WF_myExperiment/NotRedundantAnnot" > "../Results/count_nodesAndLinks_perWf.txt"
# 1.5.- Computation IC values
sh "<dirInAnnotations>"
# e.g. sh "../Data/WF_myExperiment/NotRedundantAnnot"
perl "<dirInAnnotations>" "<template ICvalue per individual annotation>" "<Output-IntermediateFile>" "<Output-Wf and Service IC withIN redundant>"
# e.g. perl "../Data/WF_myExperiment/NotRedundantAnnot/" "SML_toolkit/ICvalues/results/XXX_results_ICI.csv" "wf_service_annotation_IC.txt" "wfAndServiceIC.txt"
# 2.1.- Fragmentation and computing subgraphs distance matrix
# It needs the SML tookit ( with the configuration files provided in 'SML_toolkit/' folder
sh "<dirInAnnot>" "<dirInLinks>" "<minSizeSubgraph>" "<maxSizeSubgraph>"
# e.g. sh "../Data/WF_myExperiment/NewAnnot" "../Data/WF_myExperiment/NotRedundantAnnot" 2 3
# 2.2.- Clustering subgraphs
# It calls 52 times (13 ontologies X 2 clustering algorithms X 4 K selection methods) the script
sh "<dirInAnnot>" "<dirInLinks>" "<minSizeSubgraph>" "<maxSizeSubgraph>"
# e.g. sh "../Data/WF_myExperiment/NewAnnot" "../Data/WF_myExperiment/NotRedundantAnnot" 2 3
The input for the automatic annotation step are workflows in Taverna 1 (.xml) or 2 (.t2flow) format, in principle, from myExperiment [resource number 1, from the previous section]. The output of this first step are workflows with semantic annotations in OPMW format, corresponding to the input of the second step: fragmentation and clustering based on Semantic Similarity [resource number 2]. Finally, the output of the second step are the different subworkflows or clustered workflow fragments [resource number 3].
