Skip to content

Commit

Permalink
Merge pull request #2 from EBI-Metabolights/MTBLS-175-0.2-documentation
Browse files Browse the repository at this point in the history
Mtbls 175 0.2 documentation
  • Loading branch information
calstevemart authored Sep 29, 2022
2 parents efd7452 + fad4dd6 commit a1f6d93
Show file tree
Hide file tree
Showing 24 changed files with 192 additions and 92 deletions.
7 changes: 5 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
Package: ImperialNMRTool
Type: Package
Title: Tool for Nuclear Magnetic Resonance based spectrography.
Version: 0.0.1.0
Description: Workflow for automatic processing and annotation of 1D 1H-NMR spectra from MetaboLights repository.
Entrypoint to the workflow is hurricane.r, and it takes an argument which is the location
of a parameters file, which itself contains the location of the files the workflow needs,
and the workflow parameters. To read more visit the github repository at github.com/EBI-Metabolights/icl_nmr_R
Version: 0.2.0.0
Date: 2022-09-12
Authors@R: c(
person("Michael T. Judge", "PostDoc / Developer", role = c("aut")),
Expand All @@ -11,5 +15,4 @@ Maintainer: Callum Martin <cmartin@ebi.ac.uk>
Depends: R (>= 4.0.0)
Imports: optparse, BiocManager, stringr, MassSpecWavelet, yaml, pbapply, devtools, batch, flexclust, rlang, pracma, RcppHungarian, grid, lattice, modeltools, stats4
License: GPL-2
Description: Workflows for automatic processing and annotation of 1D 1H-NMR spectra from MetaboLights repository.
RoxygenNote: 7.2.1
2 changes: 1 addition & 1 deletion R/annotation_matching.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' Annotation Module
#' Annotation Matching Module
#'
#' Given picked peaks from statistical decomposition module, do a simple
#' comparison between each query chemical shift list (STOCSY-derived)
Expand Down
19 changes: 13 additions & 6 deletions R/detectSpecPeaks.R
Original file line number Diff line number Diff line change
@@ -1,14 +1,21 @@
#' Peak detection for spectra
#'
#' Divide the whole spectra into smaller segments and detect peaks by using MassSpecWavelet package.
#' Note that, the peak lists could be found by using other methods, this function is just a choice.
#' Divide the whole spectra into smaller segments and detect
#' peaks by using MassSpecWavelet package.
#' Note that, the peak lists could be found by using other
#' methods, this function is just a choice.
#'
#'
#' @param X The spectral dataset in matrix format in which each row contains a single sample
#' @param X The spectral dataset in matrix format in which each row contains
#' a single sample
#' @param nDivRange The size of a single small segment after division of spectra
#' @param scales The parameter of peakDetectionCWT function of MassSpecWavelet package, look it up in the original function.
#' @param baselineThresh It will remove all peaks under an intensity set by baselineThresh.
#' @param SNR.Th The parameter of peakDetectionCWT function of MassSpecWavelet package, look it up in the original function. If you set -1, the function will itself re-compute this value.
#' @param scales The parameter of peakDetectionCWT function of MassSpecWavelet
#' package, look it up in the original function.
#' @param baselineThresh It will remove all peaks under an intensity set by
#' baselineThresh.
#' @param SNR.Th The parameter of peakDetectionCWT function of
#' MassSpecWavelet package, look it up in the original function.
#' If you set -1, the function will itself re-compute this value.
#' @param verbose A boolean value to allow print out process information.
#'
#' @return The peak lists of the spectra
Expand Down
17 changes: 9 additions & 8 deletions R/exportMatches.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' exportMatches
#' Export Matches to filesystem.
#'
#' Export matches to file for visualization and storage
#' Ideally, this function will allow:
Expand All @@ -8,21 +8,23 @@
#' (including those with null matches):
#' - scores_matrix
#' - data frame containing 1 row for each non-zero scored match, and columns
#' with match info/metadata such as hmdb_id, ref spectrum id, (Jaccard) scores,
#' number of matches, number of possible matches in ref, compound name, instrument
#' properties, experiment properties, etc.
#' with match info/metadata such as hmdb_id, ref spectrum id, (Jaccard)
#' scores,
#' number of matches, number of possible matches in ref, compound name,
#' instrument, properties, experiment properties, etc.
#' - filteredResults
#' - list of lists, each list contains additional info for each row in
#' scores_matrix which could not fit into a data frame. Some info is
#' duplicated in the scores_matrix, but the important data here is:
#' - peaklists: reference and target peaklists (matched lists)
#' - matchlist: detailed match info for the matched peaks only (both ref and target):
#' - matchlist: detailed match info for the matched peaks only
#' (both ref and target):
#' - "ppms" of the peaks
#' - relative "intensities" of the reference peaks (if known)
#' - "inds" in the starting ref or target list, respectively
#' Extract data and put in tall matrix (all matched peaks).
#' Define Format: In order to plot the matches as two lists of scatter points and
#' lines connecting the matched target and ref points, we need:
#' Define Format: In order to plot the matches as two lists of scatter
#' points and lines connecting the matched target and ref points, we need:
#' - all ref peak positions and intensities
#' - all target peaks and intensities
#' - the connection from each matched ref to its target
Expand All @@ -45,7 +47,6 @@ exportMatches <- function(matches,
matchPairs <- NULL
referenceList <- NULL
targetList <- NULL
# length(matches)

# Figure out which matches are empty and exclude them
hasData <- unlist(lapply(
Expand Down
16 changes: 9 additions & 7 deletions R/hurricane.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,19 @@
#' 4] visualisation [in progress in upstream repository]
#' Also saves to RDS outputs from statistical decomposition
#' A default params.yaml file will be used if none is supplied.
#' The params file looks like this:
#'
#' The outline of the params file can be found in the repo readme.md
#'
#' @param params_loc location of the params.yaml file containing
#' pipeline run parameters.
#' @param params_obj A populated yaml object containing the parameters.
#' This parameter is intended for use when being run from galaxy.
#' @return N/A but .RDS and .TSV files are saved, see description.
#' @export
hurricane <- function(params_loc) {
hurricane <- function(params_loc, params_obj) {
default <- FALSE
if (missing(params_loc)) {
if (isFALSE(missing(params_obj))) {
run_params <- params_obj
} else if (missing(params_loc)) {
# load default params
filepath <- system.file(
"extdata", "default_params.yaml", package="ImperialNMRTool")
Expand All @@ -25,9 +29,7 @@ hurricane <- function(params_loc) {
# load supplied params
run_params <- yaml::yaml.load_file(params_loc)
}
# it may be that a user supplies their own params file but it
# is not complete. What do we do then? Fill in any missing fields
# with the corresponding values from default params?
# currently loading default files does not work.
if (isTRUE(default)) {
peaks_path <- system.file(
"extdata", "peaks.RDS", package="ImperialNMRTool")
Expand Down
3 changes: 2 additions & 1 deletion R/ind2subR.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
#' https://stackoverflow.com/questions/4452039/converting-between-matrix-subscripts-and-linear-indices-like-ind2sub-sub2ind-in
#'
#' MTJ 28FEB2022
#' @param inds The linear indices to be converted IE linear index (1 = [1,1], 2 = [2,1], 3 = [3,1], 4 = [1,2], 5 = [2,2])
#' @param inds The linear indices to be converted IE linear index
#' (1 = [1,1], 2 = [2,1], 3 = [3,1], 4 = [1,2], 5 = [2,2])
#' @param m The number of rows IE m = nrow(matrix)
#' @return List of row and column indices.
#' @export
Expand Down
24 changes: 18 additions & 6 deletions R/matchCluster2.R
Original file line number Diff line number Diff line change
@@ -1,16 +1,28 @@

#' matchCluster2
#'
#' description
#' Match Cluster 2
#'
#' provide an alternative matching method for target and ref clusters.
#' Calculate all pairwise distances between all target cluster peaks
#' and reference cluster peaks and filter for tolerance.
#' Match each peak with its closest ref peak:
#' A) start with lowest overall distance, assign target[i] to ref[r]
#' peak, record ref[r] as unavailable (remove its matches in the
#' distance matrix)
#' find next lowest distance match, and repeat until no more matches
#' B) find the best set of matches which satisfy the following:
#' ppm ranks are preserved (no criss-crossing of match lines)
#' ref peaks can be assigned to at most one target peak
#' best match set determined by sum of distances
#'
#' @param t target peaks.
#' @param r reference peaks.
#' @param tol tolerance in ppm.
#' @param method matching methodology as a string, defaults to 'itmin'.
#' Can also be substructure based 'substruct'. optimal 'opt' not implemented.
#' @return List of matches.
#' @return List of match objects.
#' @export
matchCluster2 <- function(t, r, tol = 0.02, method = "itmin") {
# Initialize reporting vectors
# Initialize reporting vectors

matched_targets <- NA
length(matched_targets) <- length(t)
Expand Down Expand Up @@ -164,4 +176,4 @@ matchCluster2 <- function(t, r, tol = 0.02, method = "itmin") {
distances = na.omit(distances),
method = method
))
}
}
4 changes: 2 additions & 2 deletions R/matchPeaksToRef.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' matchPeaksToRef
#' Match Peaks to Reference Spectra
#'
#' Function to match target and reference spectral peaks by chemical shift
#' Goncalo Graca, 16 February 2021, g.gomes-da-graca-at-imperial.ac.uk
Expand All @@ -15,7 +15,7 @@
#' @param Itol tbc
#' @param matchMethod matching methodology as a string, defaults to 'basic'
#' @param intensity flag to indicate whether to factor in intensity to matching.
#' @return list of resulting matches.
#' @return list of resulting matches and metadata objects.
#' @export
matchPeaksToRef <- function(
target, driver_ppm, reference, tol = 0.02, Itol = 20,
Expand Down
20 changes: 2 additions & 18 deletions R/matchPeaksToRef2.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#' matchPeaksToRef2
#' Match Peaks to reference Spectra 2.
#'
#' Function to match target and reference spectral peaks by chemical shift
#' Goncalo Graca, 16 February 2021, g.gomes-da-graca-at-imperial.ac.uk
Expand All @@ -20,8 +20,6 @@
matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,
Itol = 20, matchMethod = "basic", intensity = FALSE) {

# Clean up params

# To keep same argument form as before, if intensity arg is used,
# switch matchMethod
if (intensity == TRUE) {
Expand All @@ -31,9 +29,7 @@ matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,
# extract relevant metadata from reference
hmdb_name <- reference$metadata$hmdb_name
hmdb_id <- reference$metadata$hmdb_id
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
# print(hmdb_name)
# <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

spectrum_id <- reference$metadata$spectrum_id
reference <- reference$spectrum_peaks # NOTE: this overwrites reference

Expand All @@ -56,8 +52,6 @@ matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,
r <- reference[, 1]
}

#########################################################

# initialize matches and matched_targets reference and target peaks vectors
matches <- NULL
matches <- 0
Expand All @@ -71,7 +65,6 @@ matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,

matchList <- list()

#########################################################
## There are several matching methods that can be used:

# Match using chemical shift and intensity
Expand Down Expand Up @@ -103,7 +96,6 @@ matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,
}
}

#########################################################
# Match using chemical shift only (2021 algorithm)
if (matchMethod == "basic") {
# browser()
Expand Down Expand Up @@ -162,11 +154,6 @@ matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,
method = matchMethod
)

# debug when match is obtained
#>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
# if (!is.null(matchList)){browser()}
#<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

matched_targets <- matchList$matchedTargets
target_inds <- matchList$matchedTargetInds
matched_refs <- matchList$matchedRefs
Expand All @@ -180,13 +167,10 @@ matchPeaksToRef2 <- function(target, driver_ppm, reference, tol = 0.02,
}
}
}

########################################################
# Gather up scores

score <- matches / (reference_peaks + target_peaks - matches)

########################################################
# Produce result struct(s)

# Troubleshooting
Expand Down
25 changes: 20 additions & 5 deletions R/stat_decomp.R
Original file line number Diff line number Diff line change
@@ -1,11 +1,26 @@
#' Statistical Decomposition for spectra.
#'
#' Description
#' @param peaks rds peaks file loaded into memory.
#' @param spec rds spec file loaded into memory.
#' @param params yaml object containg parameters for the pipeline.
#' Given a spectral matrix and a list of picked peaks, loop
#' through each picked peak and perform a STOCSY using it as
#' a driver peak. Threshold the result at r >= 0.8, giving
#' pseudospectra (target list). Pick peak these using standard settings,
#' and store the peak clusters in ppeaks.
#'
#' @return list of stuff
#' @param peaks picked peaks for representative spectrum, numeric vector
#' containing ppm indices of peaks
#' @param spec spectral matrix consisting of n columns (one for each ppm
#' value). row 1 (named 'ppm') is the vector of ppm values. rows 2:m+1
#' (names undefined) are the m sample measurements for each ppm value.
#' @param params yaml object containg parameters for the pipeline.
#' @return list of:
#' target.RDS: list with one element for each cluster, containing:
#' [1] ppm vector
#' [2] thresholded intensities for each ppm value
#' ppeaks.RDS: list with picked peaks from each cluster.
#' One element per cluster, containing:
#' numeric vector
#' col 1: (named "chemical-shift") ppm values
#' col 2: (named "intensity") intensities (covariance from STOCSY)
#' @export
stat_decomp <- function(peaks, spec, params) {

Expand Down
5 changes: 3 additions & 2 deletions R/stocsydec.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#' stocsydec
#' Stocsy Decomposition
#'
#' Function to run 'STOCSY decomposition' Goncalo Graca, 15 September 2020, g.gomes-da-graca/at/imperial.ac.uk
#' Function to run 'STOCSY decomposition' Goncalo Graca, 15 September
#' 2020, g.gomes-da-graca/at/imperial.ac.uk
#' @param spectra individual spectra file to run through decomp
#' @param idx index number of spextra files to be run through decomp
#' @param cthr cutoff threshold
Expand Down
6 changes: 2 additions & 4 deletions R/sub2indR.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,5 @@
#' @return linear indices
#' @export
sub2indR <- function(rows, cols, m) {

return((cols-1) * m + rows)

}
return((cols - 1) * m + rows)
}
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

A package 'wrapper' for the tools and functions written by michael judge and goncalo graca, as part of efforts to make the overall NMR pipeline functional in galaxy.


## Installation
You can install the package using the R CLI straight from this repository.

Expand All @@ -13,4 +14,38 @@ You can install the package using the R CLI straight from this repository.

4 - run `library(ImperialNMRTool)` to load the package into the session.

To quickly verify if the package is installed you can run `?indsubr` . This will bring up the manual page for this function, and if loaded means that the package itself was installed and loaded successfully. The manual pages for each function contain the information you will need to run them. More detailed instruction may be added to either this repository or the manual pages at a later date.
To quickly verify if the package is installed you can run `??hurricane` . This will bring up the manual page for this function, and if loaded means that the package itself was installed and loaded successfully. The manual pages for each function contain the information you will need to run them.

## Running the workflow

If you are runnning this in a normal (non cluster / workflow manager) environment, it is easiest to do so in RStudio. If you have followed the installation steps above, the package is primed to be run. The entrypoint to the package, `ImperialNMRTool::hurricane()` takes one parameter, a params.yaml file. This file needs to be replete with the parameters you want to run the workflow with. This package has some example data that you can make use of, found in `/inst/extdata`.

### Parameters file
The parameters .yaml file for the workflow takes the following structure:


```
general_pars:
peaks_location: peaks.RDS # location of peaks.RDS file.
spec_location: spec.RDS # location of spec.RDS file.
output_dir: ~/ # tells the workflow where to write the output files.
sd_pars:
cutoff : 0.8 # standard cutoff for STOCSY
am_pars:
rank_limit: 5 # rank limit for exporting matches IE take the top 5 matches discard the rest
dist_thresh: 0.02 # no sense in allowing < precision of db peaks (0.01 ppm for hmdb)
matchMethod: hungarian_scaled # or basic, itmin, hungarian, hungarian_scaled
refdb_file: hmdb_spectra_28FEB2022.RDS # location of reference spectra .RDS file.
vis_pars:
matlab_root: /Applications/MATLAB_R2021b.app/bin # deprecated, will be removed in future release
```
The four parameters that need to be configured to your local setup in order for the workflow to run are peaks_location, spec_location, output_dor and refdb_file.

Once this file is prepared, you can run the workflow from RStudio / the command line via `ImperialNMRTool::hurricane("/path/to/your/params.yaml")`


#### Help / contact
This repository is maintained by the [MetaboLights](https://www.ebi.ac.uk/metabolights/) team at the EMBL-EBI. You can send any requests or queries to metabolights-help@ebi.ac.uk , quoting this repository.
2 changes: 1 addition & 1 deletion man/annotation_matching.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit a1f6d93

Please sign in to comment.