subsampleR

This repo provides subsampleR a flexible, modular pipeline for sub-sampling patient cohorts based on customizable criteria. This package allows users to import data from various file formats, merge it with metadata, sub-sample datasets, perform distribution statistical tests, and visualize the results.

Installation

To install the subsampleR package, you need to have R and the required libraries installed. You can directly clone the repository and use the following commands in R:

# Install dependencies if you haven't already
install.packages(c("dplyr", "ggplot2", "readr", "readxl", "arrow", "tools"))

# Load the package (assuming the `subsampleR.R` file exists in your current directory)
source("subsampleR.R")

Functions

Data Import

df <- step_import(file_path)

Imports data from various file formats (CSV, TSV, TXT, RDA, RDS, XLSX, Parquet).

Merge Data with Metadata

step_metadata(pipeline_object, metadata, cols = NULL)

Merges the input data with metadata based on the DAid column.

Sub-Sample Data

step_subsample(pipeline_object, n_samples, variable = NULL, ratio = NULL, seed = 123)

Sub-samples the data based on the specified number of samples or a ratio per category.

Perform Kolmogorov-Smirnov Test

step_ks_test(pipeline_object, population = "data", sample = "subsample_1", cols = NULL)

Performs the KS test between the population data and the sub-sampled data across specified columns.

Visualize Distributions

step_visualize(pipeline_object, population = "data", sample = "subsample_1", cols = NULL)

Visualizes the distribution of variables in both the population and sub-sampled data, generating histograms.

Case Study

To see how the subsampleR package can be utilized in practice, you can download the case study from the following link:

Case Study: SCAPIS Healthy CAC

Contact

For any questions or further information, please contact us at konstantinos.antonopoulos@scilifelab.se. Contributions are welcome, please open an issue or submit a pull request!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
renv		renv
.Rprofile		.Rprofile
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
renv.lock		renv.lock
scapis_healthy_cac_casestudy.Rmd		scapis_healthy_cac_casestudy.Rmd
scapis_healthy_cac_casestudy.html		scapis_healthy_cac_casestudy.html
subsampleR.R		subsampleR.R
subsampleR.Rproj		subsampleR.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

subsampleR

Installation

Functions

Data Import

Merge Data with Metadata

Sub-Sample Data

Perform Kolmogorov-Smirnov Test

Visualize Distributions

Case Study

Contact

About

Releases

Packages

Languages

License

HDA1472/subsampleR

Folders and files

Latest commit

History

Repository files navigation

subsampleR

Installation

Functions

Data Import

Merge Data with Metadata

Sub-Sample Data

Perform Kolmogorov-Smirnov Test

Visualize Distributions

Case Study

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages