Skip to content

Commit

Permalink
Added volcano plot and DEA vignette.
Browse files Browse the repository at this point in the history
  • Loading branch information
dereckmezquita committed Jul 13, 2024
1 parent f077553 commit 838514d
Show file tree
Hide file tree
Showing 13 changed files with 16,537 additions and 41 deletions.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(MonteCarlo)
export(Volcano)
export(bb)
export(cite_package)
export(ema)
Expand Down
10 changes: 5 additions & 5 deletions R/MonteCarlo.R
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,15 @@ MonteCarlo <- R6::R6Class(
private = list(
validate_data = \() {
if (!inherits(self$data, "data.table")) {
rlang::abort("data must be a data.table")
stop("data must be a data.table")
}
required_cols <- c(
"symbol", "datetime", "open", "high",
"low", "close", "volume", "turnover"
)

if (!all(required_cols %in% colnames(self$data))) {
rlang::abort("data must contain the following columns: ", paste(required_cols, collapse = ", "))
stop("data must contain the following columns: ", paste(required_cols, collapse = ", "))
}
},
prepare = \(log_historical = FALSE) {
Expand Down Expand Up @@ -134,7 +134,7 @@ MonteCarlo <- R6::R6Class(
#' @return A ggplot object showing the simulated price paths.
plot_prices = \() {
if (is.null(self$simulation_results) || is.null(self$end_prices)) {
rlang::abort("Must run simulation first")
stop("Must run simulation first")
}

self$simulation_results |>
Expand All @@ -154,7 +154,7 @@ MonteCarlo <- R6::R6Class(
#' @return A ggplot object showing the distribution of final prices.
plot_distribution = \() {
if (is.null(self$simulation_results) || is.null(self$end_prices)) {
rlang::abort("Must run simulation first")
stop("Must run simulation first")
}

self$end_prices |>
Expand All @@ -180,7 +180,7 @@ MonteCarlo <- R6::R6Class(
#' @return A ggplot object showing historical and simulated prices.
plot_prices_and_predictions = \() {
if (is.null(self$simulation_results) || is.null(self$end_prices)) {
rlang::abort("Must run simulation first")
stop("Must run simulation first")
}

scale_period <- ""
Expand Down
82 changes: 55 additions & 27 deletions R/Volcano.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
#' @importFrom R6 R6Class
#' @importFrom ggrepel geom_text_repel
#' @import ggplot2
#' @import data.table
#' @importFrom stringr str_interp
#' @importFrom utils head
#' @importFrom scales label_number
#' Volcano Plot R6 Class
#'
#' @description
#' An R6 class for creating and managing volcano plots for differential expression analysis.
#' This class provides methods for data preparation, plotting, and customisation of volcano plots.
#'
#' @details
#' The Volcano class uses differential expression data to create volcano plots,
#' highlighting significantly up- and down-regulated features based on log2 fold change
#' and a chosen statistical measure (e.g., FDR or p-value).
#'
#' @field data A data.table containing the differential expression data.
#' @field statistic Character. The statistic used for significance (default: "fdr").
#' @field statistic_cutoff Numeric. The cutoff value for the chosen statistic (default: 0.25).
#' @field log2_cutoff Numeric. The log2 fold change cutoff for significance (default: log2(1.5)).
#' @field head_labels Integer. The number of top features to label in the plot (default: 10).
#'
#' @export
Volcano <- R6Class(
Volcano <- R6::R6Class(
"Volcano",
private = list(
validate = \() {
if(!all(c("feature", "log2FC") %in% colnames(self$data))) {
stop(str_interp('The data must contain columns "feature", and "log2FC"; received ${colnames(self$data)}'))
stop(stringr::str_interp('The data must contain columns "feature", and "log2FC"; received ${colnames(self$data)}'))
}

if(sum(c("fdr", "p_value") %in% colnames(self$data)) != 2) {
Expand All @@ -25,44 +35,67 @@ Volcano <- R6Class(
statistic_cutoff = 0.25,
log2_cutoff = log2(1.5),
head_labels = 10,

#' @description
#' Create a new Volcano object.
#' @param data A data.table containing differential expression data.
#' @param statistic Character. The statistic used for significance (default: "fdr").
#' @param statistic_cutoff Numeric. The cutoff value for the chosen statistic (default: 0.25).
#' @param log2_cutoff Numeric. The log2 fold change cutoff for significance (default: log2(1.5)).
#' @param head_labels Integer. The number of top features to label in the plot (default: 10).
initialize = \(
data = data.table(),
statistic = "fdr",
statistic_cutoff = 0.25,
log2_cutoff = log2(1.5),
head_labels = 10
) {
self$data <- as.data.table(data)
self$data <- data.table::as.data.table(data)
self$statistic <- statistic
self$statistic_cutoff <- statistic_cutoff
self$log2_cutoff <- log2_cutoff
self$head_labels <- head_labels
private$validate()
self$process_data()
self$process()
},

#' @description
#' Process the input data for plotting.
process = \() {
copy <- data.table::copy(self$data)

copy <- copy[
order(abs(log2FC), -get(self$statistic), decreasing = c(TRUE, FALSE)),
base::order(abs(log2FC), -get(self$statistic), decreasing = c(TRUE, FALSE)),
]

copy[, sig_label := c(utils::head(feature, self$head_labels), rep(NA_character_, nrow(copy) - self$head_labels))]

copy[, highlight := fcase(
copy[, highlight := data.table::fcase(
log2FC < -self$log2_cutoff & get(self$statistic) < self$statistic_cutoff, "blue",
log2FC > self$log2_cutoff & get(self$statistic) < self$statistic_cutoff, "red",
default = "black"
)]

self$data <- copy
},
plot = function(
plot_title,
plot_subtitle,
plot_caption,
plot_y_lab,
plot_x_lab,

#' @description
#' Create the volcano plot.
#' @param plot_title Character. Title of the plot.
#' @param plot_subtitle Character. Subtitle of the plot.
#' @param plot_caption Character. Caption for the plot.
#' @param plot_y_lab Character. Label for y-axis.
#' @param plot_x_lab Character. Label for x-axis.
#' @param legend_position Character. Position of the legend (default: "bottom").
#' @param plot_theme Character. Theme for the plot (default: "dark").
#' @param label Logical. Whether to add labels to points (default: TRUE).
#' @return A ggplot object representing the volcano plot.
plot_volcano = function(
plot_title = "Volcano Plot",
plot_subtitle = "",
plot_caption = "",
plot_y_lab = stringr::str_interp('-log10(${self$statistic})'),
plot_x_lab = 'log2(fold change)',
legend_position = "bottom",
plot_theme = "dark",
label = TRUE
Expand All @@ -80,15 +113,11 @@ Volcano <- R6Class(
}

if (!missing(plot_caption) && !is.na(plot_caption) && plot_caption == "") {
plot_caption <- str_interp('${n_significant}/${nrow(self$data)} signficant; ${n_down} down, ${n_up} up\n${self$statistic}: ${self$statistic_cutoff}, log2FC: ${round(self$log2_cutoff, 4)}, linear FC: ${round(2 ^ self$log2_cutoff, 4)}')
plot_caption <- stringr::str_interp('${n_significant}/${nrow(self$data)} signficant; ${n_down} down, ${n_up} up\n${self$statistic}: ${self$statistic_cutoff}, log2FC: ${round(self$log2_cutoff, 4)}, linear FC: ${round(2 ^ self$log2_cutoff, 4)}')
}

if (!missing(plot_y_lab) && !is.na(plot_y_lab) && plot_y_lab == "") {
plot_y_lab <- str_interp('-log10(${self$statistic})')
}

if (!missing(plot_x_lab) && !is.na(plot_x_lab) && plot_x_lab == "") {
plot_x_lab <- 'log2(fold change)'
plot_y_lab <- stringr::str_interp('-log10(${self$statistic})')
}

self$data |>
Expand All @@ -100,8 +129,7 @@ Volcano <- R6Class(
breaks = c("red", "black", "blue")
) +
ggplot2::scale_y_continuous(n.breaks = 10, labels = function(x) {
# scales::label_number_si(accuracy = 0.005)(x)
scales::label_number(accuracy = 0.005)(x) # , scale_cut = scales::cut_si("unit")
scales::label_number(accuracy = 0.005)(x)
}) +
{if (any(-log10(self$data[["fdr"]]) > -log10(self$statistic_cutoff))) {
ggplot2::geom_hline(yintercept = -log10(self$statistic_cutoff), linetype = "dashed", colour = "goldenrod", linewidth = 0.75, alpha = 0.5)
Expand Down
31 changes: 27 additions & 4 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ options(
[![Travis build status](https://travis-ci.org/dereckmezquita/kucoin.svg?branch=master)](https://travis-ci.org/dereckmezquita/kucoin)
<!-- badges: end -->

`R` framework written in high-performance `C++` and `ggplot2` for financial and time series data analysis.
`R` framework written in high-performance `C++` and `ggplot2` for financial, bioinformatics, and time series data analysis.

The package provides algorithms, functions, `ggplot2` layers and most importantly a framework for working with and analysing financial and time series data.
The package provides algorithms, functions, `ggplot2` layers and most importantly a framework for working with and analysing financial, bioinformatics, and time series data.

## Installation

Expand All @@ -52,7 +52,7 @@ box::use(ggplot2)
box::use(dmplot)
```

## Getting started
## Getting started with financial data

### Get financial data

Expand Down Expand Up @@ -238,7 +238,30 @@ monte$plot_distribution()
monte$plot_prices_and_predictions()
```

### Benchmarking `dmplot`'s high-performance C++ technical indicators
## Getting started with bioinformatics data

`dmplot` offers a host of functions for working with bioinformatics data. Here we demonstrate how to use the `dmplot::Volcano()` `R6` class to plot a volcano plot.

`dmplot` imposes a convention and standard for the data it expects, in exchange it offers ease of use and efficiency in plotting and analysing data.

### Volcano plot

A volcano plot can be generated in 3 easy steps.

```{r volcano-plot}
# 1. load the data
data <- dt$fread("./data/volcano-differential-expression.csv")
head(data)
# 2. create the Volcano object
volc <- dmplot$Volcano$new(data)
# 3. plot the volcano plot
volc$plot_volcano()
```

## Benchmarking `dmplot`'s high-performance C++ technical indicators

Here we do a simple demonstration and benchmark of `dmplot`'s Bolinger Bands implementation vs the `TTR` package. Note that despite using a version not wrapped to return a `list` the `TTR` implementation is still significantly slower than `dmplot`'s C++ implementation.

Expand Down
Loading

0 comments on commit 838514d

Please sign in to comment.