If you do not have access to Biowulf or you are looking for a reference genome and/or annotation that is currently not available, it can be built with RNA-seek's build sub-command. Given a genomic FASTA file (ref.fa) and a GTF file (genes.gtf), rna-seek build will create all of the required reference files to run the RNA-seek pipeline. Once the build pipeline completes, you can supply the newly generated reference.json to the --genome of rna-seek run. For more information, please see the help page for the run and build sub commands.
The continued growth and support of NIH's Biowulf cluster is dependent upon its demonstrable value to the NIH Intramural Research Program. If you publish research that involved significant use of Biowulf, please cite the cluster.
Suggested citation text:
This work utilized the computational resources of the NIH HPC Biowulf cluster. (http://hpc.nih.gov)
+
1. Harrow, J., et al., GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res, 2012. 22(9): p. 1760-74. 2. Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. 3. Martin, M. (2011). "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet 17(1): 10-12. 4. Wood, D. E. and S. L. Salzberg (2014). "Kraken: ultrafast metagenomic sequence classification using exact alignments." Genome Biol 15(3): R46. 5. Ondov, B. D., et al. (2011). "Interactive metagenomic visualization in a Web browser." BMC Bioinformatics 12(1): 385. 6. Wingett, S. and S. Andrews (2018). "FastQ Screen: A tool for multi-genome mapping and quality control." F1000Research 7(2): 1338. 7. Dobin, A., et al., STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 2013. 29(1): p. 15-21. 8. Bushnell, B., Rood, J., & Singer, E. (2017). BBMerge - Accurate paired shotgun read merging via overlap. PloS one, 12(10), e0185056. 9. Okonechnikov, K., et al. (2015). "Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data." Bioinformatics 32(2): 292-294. 10. The Picard toolkit. https://broadinstitute.github.io/picard/. 11. Daley, T. and A.D. Smith, Predicting the molecular complexity of sequencing libraries. Nat Methods, 2013. 10(4): p. 325-7. 12. Li, H., et al. (2009). "The Sequence Alignment/Map format and SAMtools." Bioinformatics 25(16): 2078-2079. 13. Wang, L., et al. (2012). "RSeQC: quality control of RNA-seq experiments." Bioinformatics 28(16): 2184-2185. 14. Li, B. and C.N. Dewey, RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 2011. 12: p. 323. 15. Uhrig, S., et al. (2021). "Accurate and efficient detection of gene fusions from RNA sequencing data". Genome Res. 31(3): 448-460. 16. Ewels, P., et al. (2016). "MultiQC: summarize analysis results for multiple tools and samples in a single report." Bioinformatics 32(19): 3047-3048.
\ No newline at end of file
diff --git a/RNA-seq/TLDR-RNA-seq/index.html b/RNA-seq/TLDR-RNA-seq/index.html
new file mode 100644
index 0000000..f394a33
--- /dev/null
+++ b/RNA-seq/TLDR-RNA-seq/index.html
@@ -0,0 +1,73 @@
+ Getting started - RNA-seek Documentation
When processing RNA-sequencing data, there are often many steps that we must repeat. These are usually steps like removing adapter sequences, aligning reads against a reference genome, checking the quality of the data, and quantifying counts. RNA-seek is composed of several sub commands or convience functions to automate these repetitive steps.
With RNA-seek, you can run your samples through our highly-reproducible pipeline, build resources for new reference genomes, and more!
Here is a list of available rna-seek sub commands:
This page contains information for building reference files and running the RNA-seek pipeline. For more information about each of the available sub commands, please see the usage section.
RNA-seek has two dependencies: singularity and snakemake. These dependencies can be installed by a sysadmin; however, snakemake is readily available through conda. Before running the pipeline or any of the commands below, please ensure singularity and snakemake are in your $PATH. Please see follow the instructions below for getting started with the RNA-seek pipeline.
# Setup Step 1.) Please do not run RNA-seek on the head node!
+# Grab an interactive node first
+srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
+
In this example, we will start off by building reference files downloaded from GENCODE. We recommend downloading the PRI Genome FASTA file and annotation from GENCODE. These PRI reference files contain the primary chromosomes and scaffolds. We do not recommend downloading the CHR reference files!
Here is more information about GENCODE's v36 release for the human reference genome.
# Build Step 0.) Please do not run RNA-seek on the head node!
+# Grab an interactive node first
+# Assumes that you have already ssh-ed into cluster
+srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
+
+# Build Step 1.) Download the PRI Genome FASTA file for GRCh38.p13
+wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/GRCh38.primary_assembly.genome.fa.gz
+gzip -d GRCh38.primary_assembly.genome.fa.gz
+
+# Build Step 2.) Download the PRI release 36 annotation
+wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/gencode.v36.primary_assembly.annotation.gtf.gz
+gzip -d gencode.v36.primary_assembly.annotation.gtf.gz
+
An email notification will be sent out when the pipeline starts and ends. Once the build pipeline completes, you can run RNA-seek with the provided test dataset. Please see the intructions below for more information.
Run RNA-seek with the reference files we built above using hg38 (GRCh38.p13) Genome FASTA file and GENCODE release 36 annotation (GTF). For more information about how the reference files we generated, please see the intructions above. You can use those instructions as a guide for building any new reference genomes in the future.
Dry-run the pipeline prior to submiting the pipeline's master job. Please note that if you wish to run RNA-seek with a new dataset, you will only need to update the values provided to the --input and --output arguments (and maybe --genome). The --input argument supports globbing. If this is the first time running RNA-seek with for given dataset, the --output directory should not exist on your local filesystem. It will be created automatically during runtime.
# Run Step 0.) Please do not run RNA-seek on the head node!
+# Grab an interactive node first
+# Assumes that you have already ssh-ed into cluster
+srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
+
+# Run Step 1.) Load dependencies
+module purge
+module load singularity snakemake
+
+# Run Step 2.) Dry-run the pipeline with test dataset
+# And reference genome generated in the steps above
+# Test data consists of sub sampled FastQ files
+rna-seek run \
+ --input RNA-seek/.tests/*.R?.fastq.gz \
+ --output /data/${USER}/runner_hg38_36/ \
+ --genome /data/${USER}/hg38_36/hg38_36.json \
+ --mode slurm \
+ --star-2-pass-basic \
+ --dry-run
+
Kick off the pipeline by submiting the master job to the cluster. It is essentially the same command above without the --dry-run flag.
# Run Step 3.) Submit the master job
+# Runs the RNA-seek pipeline with the
+# reference genome generated in the steps above
+# and with the test dataset
+rna-seek run \
+ --input RNA-seek/.tests/*.R?.fastq.gz \
+ --output /data/${USER}/runner_hg38_36/ \
+ --genome /data/${USER}/hg38_36/hg38_36.json \
+ --mode slurm \
+ --star-2-pass-basic \
+ --dry-run
+
An email notification will be sent out when the pipeline starts and ends.
\ No newline at end of file
diff --git a/RNA-seq/Theory/index.html b/RNA-seq/Theory/index.html
new file mode 100644
index 0000000..cd9e896
--- /dev/null
+++ b/RNA-seq/Theory/index.html
@@ -0,0 +1 @@
+ Theory - RNA-seek Documentation
RNA-sequencing (RNA-seq) has a wide variety of applications; this transcriptome profiling method can be used to quantify gene and isoform expression, find changes in alternative splicing, detect gene-fusion events, call variants and much more.
It is also worth noting that RNA-seq can be coupled with other biochemical assays to analyze many other aspects of RNA biology, such as RNA–protein binding (CLIP-seq, RIP-seq), RNA structure (SHAPE-seq), or RNA–RNA interactions (CLASH-seq). These applications are, however, beyond the scope of this documentation as we focus on typical RNA-seq project (i.e. quantifying expression and gene fusions). Our focus is to outline current standards and resources for the bioinformatics analysis of RNA-seq data. We do not aim to provide an exhaustive compilation of resources or software tools. Rather, we aim to provide a guideline and conceptual overview for RNA-seq data analysis based on our best-practices RNA-seq pipeline.
Here we review all of the typical major steps in RNA-seq data analysis, starting from experimental design, quality control, read alignment, quantification of gene and transcript levels, and visualization.
Just like any other scientific experiment, a good RNA-seq experiment is hypothesis-driven. If you cannot describe the problem you are trying to address, throwing NGS at the problem is not a cure-all solution. Fishing for results is a waste of your time and is bad science. As so, designing a well-thought-out experiment around a testable question will maximize the likelihood of generating high-impact results.
The data that is generated will determine whether you have the potential to answer your biological question of interest. As a prerequisite, you need to think about how you will construct your libraries; the correct sequencing depth to address your question of interest; the number of replicates, and strategies to reduce/mitigate batch effects.
rRNA can comprise up to 80% of the RNA in a cell. An important consideration is the RNA extraction protocol that will be used to remove the highly abundant ribosomal RNA (rRNA). For eukaryotic cells, there are two major considerations: choosing whether to enrich for mRNA or whether to deplete rRNA.
Poly-(A) selection is a common method used to enrich for mRNA. This method generates the highest percentage of reads which will ultimately map to protein-coding genes-- making it a common choice for most applications. That being said, poly(A)-selection requires your RNA to be of high quality with minimal degradation. Degraded samples that are followed with ploy(A)-selection may result in a 3’ bias, which in effect, may introduce downstream biases into your results.
The second method captures total RNA through the depletion of rRNA. This method allows you to examine both mRNA and other non-coding RNA species such as lncRNAs. Again, depending on the question you are trying to answer this may be the right method for you. Although, it should be noted that both methods, mRNA and total RNA, require RINs (>8). But if you samples do contain slightly degraded RNA, you might be able to use the total RNA method over poly(A)-selection.
Sequencing depth or library size is another important design factor. As sequencing depth is increased, more transcripts will be detected (up until a saturation point), and their relative abundance will be quantified more accurately.
At the end of the day, the targeted sequencing depth depends on the aims of the experiment. Are you trying to quantify differences in gene expression, are you trying to quantify differential isoform usage or alternative splicing events? The numbers quoted below are more or less tailored to quantify differences in gene expression. If you are trying to quantify changes in alternative splicing or isoform regulation, you are going to much higher coverage (~ 100M paired-end reads).
For mRNA libraries or libraries generated from a prep kit using poly-(A) selection, we recommend a minimum sequencing depth of 10-20M paired-end reads (or 20-40M reads). RNA must be of high quality or a 3' bias may be observed.
For total RNA libraries, we recommend a sequencing depth of 25-60M paired-end reads (or 50-120M reads). RNA must be of high quality.
Note: In the sections above and below, when I say to paired-end reads I am referring to read pairs generated from paired-end sequencing of a given cDNA fragment. You will sometimes see reads reported as pairs of reads or total reads.
We recommend 4 biological replicates per experimental condition or group. Having more replicates is good for several reasons because in the real world problems arise. If you have a bad sample that cannot be used due to severe QC issues, you are still left with 3 biological replicates. This allows you to drop a bad sample without comprising statistical power downstream.
Batch effects represent unwanted sources of technical variation. Batch effects introduce non-biological variation into your data, which if not accounted for can influence the results. Through the process of library preparation to sequencing, there are a number of steps (such as RNA extraction to adapter ligation to lane loading, etc.) that might introduce biases into the resulting data.
As a general rule of thumb, the best way to reduce the introduction of batch effects is through uniform processing-- meaning you need to ensure that differences in sample handling are minimal. This means that samples should be processed by the same lab technician and everything should be done in a uniform manner. That being said, do not extract your RNA at different times, do not use different lots of reagents! If a large number of samples are being processed and everything cannot be done at the same time, process representative samples from each biological group at the same time. This will ensure that batches and your variable of interest do not become confounded. Also, keep note of which samples belong to each batch. This information will be needed for batch correction.
To reduce the possibility of introducing batch effects from sequencing, all samples should be multiplexed together on the same lane(s).
Sample
Group
Batch
Batch*
Treatment_rep_1
KO
1
1
Treatment_rep_2
KO
2
1
Treatment_rep_3
KO
1
1
Treatment_rep_4
KO
2
1
Control_rep_1
WT
1
2
Control_rep_2
WT
2
2
Control_rep_3
WT
1
2
Control_rep_4
WT
2
2
Batch = properly balanced batches, easily corrected Batch* = groups and batch totally confounded, cannot be corrected
That being said, some problems cannot be bioinformatically corrected. If your variable of interest is totally confounded with your batches, applying batch correction to fix the problem is not going to work, and will lead to undesired results (i.e. Batch* column). If batches must be introduced due to other constraining factors, please keep note which samples belong to each batch, and please put some thought into how to properly balance samples across your batches.
Quality-control (QC) is extremely important! As the old adage goes: Garbage in, Garbage out! If there is one thing that to take away from this document, let it be that. Performing QC checks will help ensure that your results are reliable and reproducible.
It is worth noting that there is a large variety of open-source tools that can be used to assess the quality of your data so there is no reason to re-invent the wheel. Please keep this in mind but also be aware that there are many wheels per se, and you will need to know which to use and when. In this next section, we will cover different quality-control checks that can be applied at different stages of your RNA-seq analysis. These recommendations are based on a few tools our best-practices RNA-seq pipeline employs.
Before drawing biological conclusions, it is important to perform quality control checks to ensure that there are no signs of sequencing error, biases in your data, or other sources of contamination. Modern high-throughput sequencers generate millions of reads per run, and in the real world, problems can arise.
The general idea is to assess the quality of your reads before and after adapter removal and to check for different sources of contamination before proceeding to alignment. Here are a few of the tools that we use and recommend.
To assess the sequencing quality of your data, we recommend running FastQC before and after adapter trimming. FastQC generates a set of basic statistics to identify problems that can arise during sequencing or library preparation. FastQC will summarize per base and per read QC metrics such as quality scores and GC content (ideally, this plot should have a normal distribution with no forms of bimodality). It will also summarize the distribution of sequence lengths and will report the presence of adapter sequences, which is one reason we run it after removing adapters.
During the process of sample collection to library preparation, there is a risk for introducing wanted sources of DNA. FastQ Screen compares your sequencing data to a set of different reference genomes to determine if there is contamination. It allows a user to see if the composition of your library matches what you expect. If your data has high levels of human, mouse, fungi, or bacterial contamination, FastQ Screen will tell you. FastQ Screen will tell you what percentage of your library aligns against different reference genomes.
If there are high levels of microbial contamination, Kraken will provide an estimation of the taxonomic composition. Kraken can be used in conjunction with Krona to produce interactive reports.
Note: Due to high levels of homology between organisms, there may be a small portion of your reads that align to an unexpected reference genome. Again, this should be a minimal percentage of your reads.
Again, there are many tools available to assess the quality of your data post-alignment, and as stated before, there is no need to re-invent the wheel. Please see the table below for a generalized set of guidelines for different pre/post QC metrics.
Preseq can be used to estimate the complexity of a library for each of your samples. If the duplication rate is very high, the overall library complexity will be low. Low library complexity could signal an issue with library preparation or sample preparation (FFPE samples) where very little input RNA was over-amplified or the sample may be degraded.
Picard has a particularly useful sub-command called CollectRNAseqMetrics which reports the number and percentage of reads that align to various regions: such as coding, intronic, UTR, intergenic and ribosomal regions. This is particularly useful as you would expect a library constructed with ploy(A)-selection to have a high percentage of reads that map to coding regions. Picard CollectRNAseqMetrics will also report the uniformity of coverage across all genes, which is useful for determining whether a sample has a 3' bias (observed in libraries containing degraded RNA).
This is another particularity useful package that is tailored for RNA-seq data. The package is made up of over 20 sub-module that can be used to do things like calculate the average insert size between paired-end reads (which is useful for GEO upload), annotate the percentage of reads spanning known or novel splice junctions, convert a BAM file into a normalized BigWig file, and infer RNA quality.
Here is a set of generalized guidelines for different QC metrics. Some of these metrics will vary genome-to-genome depending on the quality of the assembly and annotation but that has been taken into consideration for our set of supported reference genomes.
Starting from raw data (FastQ files), how do we get a raw counts matrix, or how do we get a list of differential expressed genes? Before feeding your data into an R package for differential expression analysis, it needs to be processed to add biological context to it. In this section, we will talk about the data processing pipeline in more detail-- more specifically focusing on primary and secondary analysis.
One of the first steps in this process is to remove any unwanted adapters sequences from your reads in before alignment. Adapters are composed of synthetic sequences and should be removed prior to alignment. Adapter removal is especially important in certain protocols, such as miRNA-seq. When smaller fragments are sequenced it is almost certain there will be some form of adapter contamination.
In the alignment step, we add biological context to the raw data. In this step, we align reads to the reference genome to find where the sequenced fragments originate.
Accurate alignment of the cDNA fragments (which are derived from RNA) is difficult. Alternative splicing introduces the problem of aligning to non-contiguous regions, and using traditional genomic alignment algorithms can produce inaccurate or low-quality alignments due to the combination of alternative splicing and genomic variation (substitutions, insertions, and deletions). This has lead to the development of splice-aware aligners like STAR, which are designed to overcome these issues. STAR can also be run in a two-pass mode for enhanced detection of reads mapping to novel splice junctions.
In the quantification step, the number of reads that mapped to a particular genomic feature (such as a gene or isoform) is counted. It is important to keep in mind that raw counts are biased by a number of factors such as library size, feature-length, and other compositional biases. As so, it is important to normalize your data to remove these biases before summarizing differences between groups of samples.
\ No newline at end of file
diff --git a/RNA-seq/build/index.html b/RNA-seq/build/index.html
new file mode 100644
index 0000000..84954e0
--- /dev/null
+++ b/RNA-seq/build/index.html
@@ -0,0 +1,64 @@
+ build - RNA-seek Documentation
The rna-seek executable is composed of several inter-related sub commands. Please see rna-seek -h for all available options.
This part of the documentation describes options and concepts for rna-seek build sub command in more detail. With minimal configuration, the build sub command enables you to build new reference files for the rna-seek run pipeline.
Setting up the RNA-seek build pipeline is fast and easy! In its most basic form, rna-seek build only has five required inputs.
The synopsis for each command shows its parameters and their usage. Optional parameters are shown in square brackets.
A user must provide the genomic sequence of the reference's assembly in FASTA format via --ref-fa argument, an alias for the reference genome via --ref-name argument, a gene annotation for the reference assembly via --ref-gtf argument, an alias or version for the gene annotation via the --gtf-ver argument, and an output directory to store the built reference files via --output argument. If you are running the pipeline outside of Biowulf, you will need to additionally provide the the following options: --shared-resources, --tmp-dir. More information about each of these options can be found below.
For human and mouse data, we highly recommend downloading the latest available PRI genome assembly and corresponding gene annotation from GENCODE. These reference files contain chromosomes and scaffolds sequences.
The build pipeline will generate a JSON file containing key, value pairs to required reference files for the rna-seek run pipeline. This file will be located in the path provided to --output. The name of this JSON file is dependent on the values provided to --ref-name and --gtf-ver and has the following naming convention: {OUTPUT}/{REF_NAME}_{GTF_VER}.json. Once the build pipeline completes, this reference JSON file can be passed to the --genome option of rna-seek run. This is how new references are built for the RNA-seek pipeline.
Use you can always use the -h option for information on a specific command.
Each of the following arguments are required. Failure to provide a required argument will result in a non-zero exit-code.
--ref-fa REF_FA
Genomic FASTA file of the reference genome. type: file
This file represents the genome sequence of the reference assembly in FASTA format. If you are downloading this from GENCODE, you should select the PRI genomic FASTA file. This file will contain the primary genomic assembly (contains chromosomes and scaffolds). This input file should not be compressed. Sequence identifers in this file must match with sequence identifers in the GTF file provided to --ref-gtf.
Name or alias for the reference genome. This can be the common name for the reference genome. Here is a list of common examples for different model organisms: mm10, hg38, rn6, danRer11, dm6, canFam3, sacCer3, ce11. If the provided values contains one of the following sub-strings (hg19, hs37d, grch37, hg38, hs38d, grch38, mm10, grcm38), then Arriba will run with its corresponding blacklist.
Example:--ref-name hg38
--ref-gtf REF_GTF
Gene annotation or GTF file for the reference genome. type: file
This file represents the reference genome's gene annotation in GTF format. If you are downloading this from GENCODE, you should select the 'PRI' GTF file. This file contains gene annotations for the primary assembly (contains chromosomes and scaffolds). This input file should not be compressed. Sequence identifers (column 1) in this file must match with sequence identifers in the FASTA file provided to --ref-fa. Example:--ref-gtf gencode.v36.primary_assembly.annotation.gtf
--gtf-ver GTF_VER
Version of the gene annotation or GTF file provided. type: string or int
This is the version of the supplied gene annotation or GTF file. If you are using a GTF file from GENCODE, use the release number or version (i.e. M25 for mouse or 37 for human). Visit gencodegenes.org for more details. Example:--gtf-ver 36
--output OUTPUT
Path to an output directory. type: path
This location is where the build pipeline will create all of its output files. If the user-provided working directory has not been initialized, it will automatically be created. Example:--output /data/$USER/refs/hg38_v36/
Each of the following arguments are optional and do not need to be provided. If you are running the pipeline outside of Biowulf, the --shared-resources option only needs to be provided at least once. This will ensure reference files that are shared across different genomes are downloaded locally.
--shared-resources SHARED_RESOURCES
Local path to shared resources. type: path
The pipeline uses a set of shared reference files that can be re-used across reference genomes. These currently include reference files for kraken and FQScreen. These reference files can be downloaded with the build sub command's --shared-resources option. With that being said, these files only need to be downloaded once. We recommend storing this files in a shared location on the filesystem that other people can access. If you are running the pipeline on Biowulf, you do NOT need to download these reference files! They already exist on the filesystem in a location that anyone can acceess; however, if you are running the pipeline on another cluster or target system, you will need to download the shared resources with the build sub command, and you will need to provide this option every time you run the pipeline. Please provide the same path that was provided to the build sub command's --shared-resources option. Again, if you are running the pipeline on Biowulf, you do NOT need to provide this option. For more information about how to download shared resources, please reference the build sub command's --shared-resources option.
Example:--shared-resources /data/shared/rna-seek
--small-genome
Builds a small genome index. type: boolean
For small genomes, it is recommeded running STAR with a scaled down --genomeSAindexNbases value. This option runs the build pipeline in a mode where it dynamically finds the optimal value for this option using the following formula: min(14, log2(GenomeSize)/2 - 1). Generally speaking, this option is not really applicable for most mammalian reference genomes, i.e. human and mouse; however, researcher working with very small reference genomes, like S. cerevisiae ~ 12Mb, should provide this option.
When in doubt feel free to provide this option, as the optimal value will be found based on your input. It is also worth noting that if you are working with a prokaryotic genome, like a bacterial genome, you will run to provide the --prokaryote option to the run subcommand.
Displays what steps in the build pipeline remain or will be run. Does not execute anything!
Example:--dry-run
--singularity-cache SINGULARITY_CACHE
Overrides the $SINGULARITY_CACHEDIR environment variable. type: path default: --output OUTPUT/.singularity
Singularity will cache image layers pulled from remote registries. This ultimately speeds up the process of pull an image from DockerHub if an image layer already exists in the singularity cache directory. By default, the cache is set to the value provided to the --output argument. Please note that this cache cannot be shared across users. Singularity strictly enforces you own the cache directory and will return a non-zero exit code if you do not own the cache directory! See the --sif-cache option to create a shareable resource.
Path where a local cache of SIFs are stored. type: path
Uses a local cache of SIFs on the filesystem. This SIF cache can be shared across users if permissions are set correctly. If a SIF does not exist in the SIF cache, the image will be pulled from Dockerhub and a warning message will be displayed. The rna-seek cache subcommand can be used to create a local SIF cache. Please see rna-seek cache for more information. This command is extremely useful for avoiding DockerHub pull rate limits. It also remove any potential errors that could occur due to network issues or DockerHub being temporarily unavailable. We recommend running RNA-seek with this option when ever possible.
Example:--singularity-cache /data/$USER/SIFs
--tmp-dir TMP_DIR
Path on the file system for writing temporary files. type: path default: /lscratch/$SLURM_JOBID
This is a path on the file system for writing temporary output files. By default, the temporary directory is set to '/lscratch/$SLURM_JOBID' for backwards compatibility with the NIH's Biowulf cluster; however, if you are running the pipeline on another cluster, this option will need to be specified. Ideally, this path should point to a dedicated location on the filesystem for writing tmp files. On many systems, this location is set to somewhere in /scratch. If you need to inject a variable into this string that should NOT be expanded, please quote this options value in single quotes. Again, if you are running the pipeline on Biowulf, you do NOT need to provide this option.
If you have two GTF files, e.g. hybrid genomes (host + virus), then you need to create one genomic FASTA file and one GTF file for the hybrid genome prior to running the rna-seek build command.
We recommend creating an artifical chromosome for the non-host sequence. The sequence identifer in the FASTA file must match the sequence identifer in the GTF file (column 1). Generally speaking, since the host annotation is usually downloaded from Ensembl or GENCODE, it will be correctly formatted; however, that may not be the case for the non-host sequence!
Please ensure the non-host annotation contains the following features and/or constraints:
for a given gene feature
each gene entry has at least one transcript feature
and each transcript entry has atleast one exon feature
If not, the GTF file may need to be manually curated until these conditions are satisfied.
Here is an example feature from a hand-curated Biotyn_probe GTF file:
line 1: the gene feature has 3 required attributes in column 9: gene_id and gene_name and gene_biotype
line 2: the transcript entry for the above gene repeats the same attributes with following required fields: transcript_id and transcript_name
Please note:transcript_type is optional
line 3: the exon entry for the above transcript has 3 required attributes: gene_id and transcript_id and gene_biotype
Please note:transcript_type is optional
For a given gene, the combination of the gene_id AND gene_name should form a unique string. There should be no instances where two different genes share the same gene_id AND gene_name.
While building reference genomes from various sources, you may run into unexpected issues with the GTF file that was provided. The GTF file format has evolved over the years. Each iteration of the format has its own set of features and attributes. And while there is a basic defintion for the GTF file format, overall there is a general lack of standardization.
Most of the issues encountered with the build pipeline can be attributed to this lack of standardization. Over the years, several tools have been developed to convert between formats. AGAT is an awesome set of tools that can convert between formats and fix issues as they are encountered.
With that being said, we have provided a universal script to fix malformed GTF files. It also has the extra benefit that it can convert between GFF and GTF formats. As so, we recommened running this script if you run into any issues. This script is also recommended over ./resources/gff3togtf.py, which will be depreciated in the near future.
For more information about the script and its usage, please run:
Running the pipeline outside of Biowulf is easy; however, there are a few extra options you must provide. Please note when running the build sub command for the first time, you will also need to provide the --shared-resources option. This option will download our kraken2 database and bowtie2 indices for FastQ Screen. The path provided to this option should be provided to the --shared-resources option of the run sub command. Next, you will also need to provide a path to write temporary output files via the --tmp-dir option. We also recommend providing a path to a SIF cache. You can cache software containers locally with the cache sub command.
# Step 0.) Grab an interactive node (do not run on head node)
+srun -N 1 -n 1 --time=2:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
+# Add snakemake and singularity to $PATH,
+# This step may vary across clusters, you
+# can reach out to a sys admin if snakemake
+# and singularity are not installed.
+module purge
+module load singularity snakemake
+
+# Step 1.) Dry run the Build pipeline
+./rna-seek build --ref-fa GRCm39.primary_assembly.genome.fa \
+ --ref-name mm39 \
+ --ref-gtf gencode.vM26.annotation.gtf \
+ --gtf-ver M26 \
+ --output /data/$USER/refs/mm39_M26 \
+ --shared-resources /data/shared/rna-seek \
+ --tmp-dir /cluster_scratch/$USER/ \
+ --sif-cache /data/$USER/cache \
+ --dry-run
+
+# Step 2.) Build new RNA-seek reference files
+./rna-seek build --ref-fa GRCm39.primary_assembly.genome.fa \
+ --ref-name mm39 \
+ --ref-gtf gencode.vM26.annotation.gtf \
+ --gtf-ver M26 \
+ --output /data/$USER/refs/mm39_M26 \
+ --shared-resources /data/shared/rna-seek \
+ --tmp-dir /cluster_scratch/$USER/ \
+ --sif-cache /data/$USER/cache
+
\ No newline at end of file
diff --git a/RNA-seq/cache/index.html b/RNA-seq/cache/index.html
new file mode 100644
index 0000000..3aec3d1
--- /dev/null
+++ b/RNA-seq/cache/index.html
@@ -0,0 +1,14 @@
+ cache - RNA-seek Documentation
The rna-seek executable is composed of several inter-related sub commands. Please see rna-seek -h for all available options.
This part of the documentation describes options and concepts for rna-seek cache sub command in more detail. With minimal configuration, the cache sub command enables you to cache remote resources for the RNA-seek pipeline. Caching remote resources allows the pipeline to run in an offline mode.
The cache sub command creates local cache on the filesysytem for resources hosted on DockerHub or AWS S3. These resources are normally pulled onto the filesystem when the pipeline runs; however, due to network issues or DockerHub pull rate limits, it may make sense to pull the resources once so a shared cache can be created and re-used. It is worth noting that a singularity cache cannot normally be shared across users. Singularity strictly enforces that its cache is owned by the user. To get around this issue, the cache subcommand can be used to create local SIFs on the filesystem from images on DockerHub.
Caching remote resources for the RNA-seek pipeline is fast and easy! In its most basic form, rna-seek cache only has one required input.
The synopsis for each command shows its parameters and their usage. Optional parameters are shown in square brackets.
A user must provide a directory to cache remote Docker images via the --sif-cache argument. Once the cache has pipeline completed, the local sif cache can be passed to the --sif-cache option of the rna-seek build and rna-seek run subcomand. This enables the build and run pipeline to run in an offline mode.
Use you can always use the -h option for information on a specific command.
Path where a local cache of SIFs will be stored. type: path
Any images defined in config/containers/images.json will be pulled into the local filesystem. The path provided to this option can be passed to the --sif-cache option of the rna-seek build and rna-seek run subcomand. This allows for running the build and run pipelines in an offline mode where no requests are made to external sources. This is useful for avoiding network issues or DockerHub pull rate limits. Please see rna-seek build and run for more information.
\ No newline at end of file
diff --git a/RNA-seq/images/RNA-seek_Pipeline.svg b/RNA-seq/images/RNA-seek_Pipeline.svg
new file mode 100644
index 0000000..4d14b3e
--- /dev/null
+++ b/RNA-seq/images/RNA-seek_Pipeline.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/RNA-seq/run/index.html b/RNA-seq/run/index.html
new file mode 100644
index 0000000..b663c48
--- /dev/null
+++ b/RNA-seq/run/index.html
@@ -0,0 +1,67 @@
+ run - RNA-seek Documentation
The rna-seek executable is composed of several inter-related sub commands. Please see rna-seek -h for all available options.
This part of the documentation describes options and concepts for rna-seek run sub command in more detail. With minimal configuration, the run sub command enables you to start running the data processing and quality-control pipeline.
Setting up the RNA-seek pipeline is fast and easy! In its most basic form, rna-seek run only has three required inputs.
The synopsis for each command shows its parameters and their usage. Optional parameters are shown in square brackets.
A user must provide a list of FastQ files (globbing is supported) to analyze via --input argument, an output directory to store results via --output argument and select reference genome for alignment and annotation via the --genome argument. If you are running the pipeline outside of Biowulf, you will need to additionally provide the the following options: --shared-resources, --tmp-dir. More information about each of these options can be found below.
Use you can always use the -h option for information on a specific sub command.
Each of the following arguments are required. Failure to provide a required argument will result in a non-zero exit-code.
--input INPUT [INPUT ...]
Input FastQ file(s) to process. type: file
One or more FastQ files can be provided. From the command-line, each FastQ file should seperated by a space. Globbing is supported! This makes selecting FastQ files easier. Input FastQ files should be gzipp-ed. The pipeline supports single-end and pair-end RNA-seq data; however, the pipeline will not process a mixture of single-end and paired-end samples together. If you have a mixture of single-end and pair-end samples to process, please process them as two seperate instances of the RNA-seek pipeline (with two seperate output directories).
Example:--input .tests/*.R?.fastq.gz
--output OUTPUT
Path to an output directory. type: path
This location is where the pipeline will create all of its output files, also known as the pipeline's working directory. If the provided output directory does not exist, it will be initialized automatically.
Example:--output /data/$USER/RNA_hg38
--genome {hg38_30,mm10_M21,custom.json}
Reference genome. type: string or file
This option defines the reference genome for your set of samples. On Biowulf, RNA-seek does comes bundled with pre built reference files for human and mouse samples; however, it is worth noting that the pipeline does accept a custom reference genome built with the build sub command. Building a new reference genome is easy! You can create a custom reference genome with a single command. This is extremely useful when working with non-model organisms. New users can reference the documentation's getting started section to see how a reference genome is built.
Pre built Option Here is a list of available pre built genomes on Biowulf: hg38_30 or mm10_M21. Please see the resources page for more information about each pre built option.
Custom Option A user can also supply a custom reference genome built with the build sub command. Please supply the custom reference JSON file that was generated by the build sub command. The name of this custom reference JSON file is dependent on the values provided to the following rna-seek build args, --ref-name REF_NAME and --gtf-ver GTF_VER, where the name of the provided custom reference JSON file would be: {REF_NAME}_{GTF_VER}.json.
Run with prokaryotic genome alignment options. type: boolean
Prokaryotic genomes, like bacteria, do not contain introns. If provided, this option will use an optimized set of options for aligning against prokaryotic genomes. This option will force STAR to avoid spliced alignments, and it will also run STAR in a 2-pass basic mode. By default, the pipeline is setup for handling alignment against eukarytoic genomes, so this option should be provided if you are working with a prokaryotic genome. This option should not be combined with the small RNA option.
Example:--prokaryote
--small-rna
Run STAR using ENCODE's recomendations for small RNA. type: boolean
This option should only be used with small RNA libraries. These are rRNA-depleted libraries that have been size selected to contain fragments shorter than 200bp. Size selection enriches for small RNA species such as miRNAs, siRNAs, or piRNAs. Also, this option should not be combined with the star 2-pass basic option. If the two options are combined, STAR will run in pass basic mode. This means that STAR will not run with ENCODE's recommendations for small RNA alignment. As so, please take caution not to combine both options together.
Please note: This option is only supported with single-end data.
Example:--small-rna
--star-2-pass-basic
Run STAR in per sample 2-pass mapping mode. type: boolean
It is recommended to use this option when processing a set of unrelated samples or when processing samples in a clinical setting. It is not adivsed to use this option for a study with multiple related samples.
By default, the pipeline ultilizes a multi sample 2-pass mapping approach where the set of splice junctions detected across all samples are provided to the second pass of STAR. This option overrides the default behavior so each sample will be processed in a per sample two-pass basic mode. This option should not be combined with the small RNA option. If the two options are combined, STAR will run in pass basic mode.
Each of the following arguments are optional and do not need to be provided.
--dry-run
Dry run the pipeline. type: boolean
Displays what steps in the pipeline remain or will be run. Does not execute anything!
Example:--dry-run
--mode {slurm,local}
Execution Method.type: string default: slurm
Execution Method. Defines the mode or method of execution. Vaild mode options include: slurm or local.
local Local executions will run serially on compute instance. This is useful for testing, debugging, or when a users does not have access to a high performance computing environment. If this option is not provided, it will default to a local execution mode.
slurm The slurm execution method will submit jobs to a cluster using a slurm + singularity backend. This method will automatically submit the master job to the cluster. It is recommended running RNA-seek in this mode as execution will be significantly faster in a distributed environment.
Example:--mode slurm
--shared-resources SHARED_RESOURCES
Local path to shared resources. type: path
The pipeline uses a set of shared reference files that can be re-used across reference genomes. These currently include reference files for kraken and FQScreen. These reference files can be downloaded with the build sub command's --shared-resources option. With that being said, these files only need to be downloaded once. We recommend storing this files in a shared location on the filesystem that other people can access. If you are running the pipeline on Biowulf, you do NOT need to download these reference files! They already exist on the filesystem in a location that anyone can acceess; however, if you are running the pipeline on another cluster or target system, you will need to download the shared resources with the build sub command, and you will need to provide this option every time you run the pipeline. Please provide the same path that was provided to the build sub command's --shared-resources option. Again, if you are running the pipeline on Biowulf, you do NOT need to provide this option. For more information about how to download shared resources, please reference the build sub command's --shared-resources option.
Example:--shared-resources /data/shared/rna-seek
--singularity-cache SINGULARITY_CACHE
Overrides the $SINGULARITY_CACHEDIR environment variable. type: path default: --output OUTPUT/.singularity
Singularity will cache image layers pulled from remote registries. This ultimately speeds up the process of pull an image from DockerHub if an image layer already exists in the singularity cache directory. By default, the cache is set to the value provided to the --output argument. Please note that this cache cannot be shared across users. Singularity strictly enforces you own the cache directory and will return a non-zero exit code if you do not own the cache directory! See the --sif-cache option to create a shareable resource.
Path where a local cache of SIFs are stored. type: path
Uses a local cache of SIFs on the filesystem. This SIF cache can be shared across users if permissions are set correctly. If a SIF does not exist in the SIF cache, the image will be pulled from Dockerhub and a warning message will be displayed. The rna-seek cache subcommand can be used to create a local SIF cache. Please see rna-seek cache for more information. This command is extremely useful for avoiding DockerHub pull rate limits. It also remove any potential errors that could occur due to network issues or DockerHub being temporarily unavailable. We recommend running RNA-seek with this option when ever possible.
Example:--singularity-cache /data/$USER/SIFs
--tmp-dir TMP_DIR
Path on the file system for writing temporary files. type: path default: /lscratch/$SLURM_JOBID
This is a path on the file system for writing temporary output files. By default, the temporary directory is set to '/lscratch/$SLURM_JOBID' for backwards compatibility with the NIH's Biowulf cluster; however, if you are running the pipeline on another cluster, this option will need to be specified. Ideally, this path should point to a dedicated location on the filesystem for writing tmp files. On many systems, this location is set to somewhere in /scratch. If you need to inject a variable into this string that should NOT be expanded, please quote this options value in single quotes. Again, if you are running the pipeline on Biowulf, you do NOT need to provide this option.
Example:--tmp-dir /cluster_scratch/$USER/
--threads THREADS
Max number of threads for each process. type: int default: 2
Max number of threads for each process. This option is more applicable when running the pipeline with --mode local. It is recommended setting this vaule to the maximum number of CPUs available on the host machine.
On Biowulf getting started with the pipeline is fast and easy! The pipeline comes bundled with pre-built human and mouse reference genomes. In the example below, we will use the pre-built human reference genome.
# Step 0.) Grab an interactive node (do not run on head node)
+srun -N 1 -n 1 --time=12:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
+module purge
+module load singularity snakemake
+
+# Step 1.) Dry run pipeline with provided test data
+./rna-seek run --input .tests/*.R?.fastq.gz \
+ --output /data/$USER/RNA_hg38 \
+ --genome hg38_30 \
+ --mode slurm \
+ --star-2-pass-basic \
+ --sif-cache /data/OpenOmics/SIFs/ \
+ --dry-run
+
+# Step 2.) Run RNA-seek pipeline
+# The slurm mode will submit jobs to the cluster.
+# It is recommended running rna-seek in this mode.
+./rna-seek run --input .tests/*.R?.fastq.gz \
+ --output /data/$USER/RNA_hg38 \
+ --genome hg38_30 \
+ --mode slurm \
+ --sif-cache /data/OpenOmics/SIFs/ \
+ --star-2-pass-basic
+
Running the pipeline outside of Biowulf is easy; however, there are a few extra steps you must first take. Before getting started, you will need to build reference files for the pipeline. Please note when running the build sub command for the first time, you will also need to provide the --shared-resources option. This option will download our kraken2 database and bowtie2 indices for FastQ Screen. The path provided to this option should be provided to the --shared-resources option of the run sub command. Next, you will also need to provide a path to write temporary output files via the --tmp-dir option. We also recommend providing a path to a SIF cache. You can cache software containers locally with the cache sub command.
# Step 0.) Grab an interactive node (do not run on head node)
+srun -N 1 -n 1 --time=2:00:00 -p interactive --mem=8gb --cpus-per-task=4 --pty bash
+# Add snakemake and singularity to $PATH,
+# This step may vary across clusters, you
+# can reach out to a sys admin if snakemake
+# and singularity are not installed.
+module purge
+module load singularity snakemake
+
+# Step 1.) Dry run pipeline with provided test data
+./rna-seek run --input .tests/*.R?.fastq.gz \
+ --output /data/$USER/RNA_hg38 \
+ --genome /data/$USER/hg38_36/hg38_36.json \
+ --mode slurm \
+ --sif-cache /data/$USER/cache \
+ --star-2-pass-basic \
+ --shared-resources /data/shared/rna-seek \
+ --tmp-dir /cluster_scratch/$USER/ \
+ --dry-run
+
+# Step 2.) Run RNA-seek pipeline
+# The slurm mode will submit jobs to the cluster.
+# It is recommended running rna-seek in this mode.
+./rna-seek run --input .tests/*.R?.fastq.gz \
+ --output /data/$USER/RNA_hg38 \
+ --genome /data/$USER/hg38_36/hg38_36.json \
+ --mode slurm \
+ --sif-cache /data/$USER/cache \
+ --star-2-pass-basic \
+ --shared-resources /data/shared/rna-seek \
+ --tmp-dir /cluster_scratch/$USER/ \
+ --dry-run
+
\ No newline at end of file
diff --git a/RNA-seq/unlock/index.html b/RNA-seq/unlock/index.html
new file mode 100644
index 0000000..ec8ffe1
--- /dev/null
+++ b/RNA-seq/unlock/index.html
@@ -0,0 +1,9 @@
+ unlock - RNA-seek Documentation
The rna-seek executable is composed of several inter-related sub commands. Please see rna-seek -h for all available options.
This part of the documentation describes options and concepts for rna-seek unlock sub command in more detail. With minimal configuration, the unlock sub command enables you to unlock a pipeline output directory.
If the pipeline fails ungracefully, it maybe required to unlock the working directory before proceeding again. Snakemake will inform a user when it maybe necessary to unlock a working directory with an error message stating: Error: Directory cannot be locked.
Please verify that the pipeline is not running before running this command. If the pipeline is currently running, the workflow manager will report the working directory is locked. The is the default behavior of snakemake, and it is normal. Do NOT run this command if the pipeline is still running! Please kill the master job and it's child jobs prior to running this command.
Unlocking an RNA-seek pipeline output directory is fast and easy! In its most basic form, rna-seek run only has one required inputs.
The synopsis for this command shows its parameters and their usage. Optional parameters are shown in square brackets.
A user must provide an output directory to unlock via --output argument. After running the unlock sub command, you can resume the build or run pipeline from where it left off by re-running it.
Use you can always use the -h option for information on a specific command.
Path to a previous run's output directory to unlock. This will remove a lock on the working directory. Please verify that the pipeline is not running before running this command. Example:--output /data/$USER/RNA_hg38
\n }\n \n \n )\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a search result\n *\n * @param result - Search result\n *\n * @returns Element\n */\nexport function renderSearchResultItem(\n result: SearchResultItem\n): HTMLElement {\n const threshold = result[0].score\n const docs = [...result]\n\n /* Find and extract parent article */\n const parent = docs.findIndex(doc => !doc.location.includes(\"#\"))\n const [article] = docs.splice(parent, 1)\n\n /* Determine last index above threshold */\n let index = docs.findIndex(doc => doc.score < threshold)\n if (index === -1)\n index = docs.length\n\n /* Partition sections */\n const best = docs.slice(0, index)\n const more = docs.slice(index)\n\n /* Render children */\n const children = [\n renderSearchDocument(article, Flag.PARENT | +(!parent && index === 0)),\n ...best.map(section => renderSearchDocument(section, Flag.TEASER)),\n ...more.length ? [\n \n \n {more.length > 0 && more.length === 1\n ? translation(\"search.result.more.one\")\n : translation(\"search.result.more.other\", more.length)\n }\n \n {...more.map(section => renderSearchDocument(section, Flag.TEASER))}\n \n ] : []\n ]\n\n /* Render search result */\n return (\n
\n {children}\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { SourceFacts } from \"~/components\"\nimport { h, round } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render repository facts\n *\n * @param facts - Repository facts\n *\n * @returns Element\n */\nexport function renderSourceFacts(facts: SourceFacts): HTMLElement {\n return (\n
\n {typeof value === \"number\" ? round(value) : value}\n
\n ))}\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { h } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a table inside a wrapper to improve scrolling on mobile\n *\n * @param table - Table element\n *\n * @returns Element\n */\nexport function renderTable(table: HTMLElement): HTMLElement {\n return (\n
\n
\n {table}\n
\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { configuration, translation } from \"~/_\"\nimport { h } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Version\n */\nexport interface Version {\n version: string /* Version identifier */\n title: string /* Version title */\n aliases: string[] /* Version aliases */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a version\n *\n * @param version - Version\n *\n * @returns Element\n */\nfunction renderVersion(version: Version): HTMLElement {\n const config = configuration()\n\n /* Ensure trailing slash, see https://bit.ly/3rL5u3f */\n const url = new URL(`${version.version}/`, config.base)\n return (\n
\n )\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a version selector\n *\n * @param versions - Versions\n *\n * @returns Element\n */\nexport function renderVersionSelector(versions: Version[]): HTMLElement {\n const config = configuration()\n\n /* Determine active version */\n const [, current] = config.base.match(/([^/]+)\\/?$/)!\n const active =\n versions.find(({ version, aliases }) => (\n version === current || aliases.includes(current)\n )) || versions[0]\n\n /* Render version selector */\n return (\n
\n \n
\n {versions.map(renderVersion)}\n
\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { Observable, Subject } from \"rxjs\"\nimport {\n filter,\n map,\n mapTo,\n mergeWith,\n tap\n} from \"rxjs/operators\"\n\nimport { Component } from \"../../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Details\n */\nexport interface Details {}\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch options\n */\ninterface WatchOptions {\n target$: Observable /* Location target observable */\n print$: Observable /* Print mode observable */\n}\n\n/**\n * Mount options\n */\ninterface MountOptions {\n target$: Observable /* Location target observable */\n print$: Observable /* Print mode observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch details\n *\n * @param el - Details element\n * @param options - Options\n *\n * @returns Details observable\n */\nexport function watchDetails(\n el: HTMLDetailsElement, { target$, print$ }: WatchOptions\n): Observable {\n return target$\n .pipe(\n map(target => target.closest(\"details:not([open])\")!),\n filter(details => el === details),\n mergeWith(print$),\n mapTo(el)\n )\n}\n\n/**\n * Mount details\n *\n * This function ensures that `details` tags are opened on anchor jumps and\n * prior to printing, so the whole content of the page is visible.\n *\n * @param el - Details element\n * @param options - Options\n *\n * @returns Details component observable\n */\nexport function mountDetails(\n el: HTMLDetailsElement, options: MountOptions\n): Observable> {\n const internal$ = new Subject()\n internal$.subscribe(() => {\n el.setAttribute(\"open\", \"\")\n el.scrollIntoView()\n })\n\n /* Create and return component */\n return watchDetails(el, options)\n .pipe(\n tap(internal$),\n mapTo({ ref: el })\n )\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { Observable, of } from \"rxjs\"\n\nimport { createElement, replaceElement } from \"~/browser\"\nimport { renderTable } from \"~/templates\"\n\nimport { Component } from \"../../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Data table\n */\nexport interface DataTable {}\n\n/* ----------------------------------------------------------------------------\n * Data\n * ------------------------------------------------------------------------- */\n\n/**\n * Sentinel for replacement\n */\nconst sentinel = createElement(\"table\")\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount data table\n *\n * This function wraps a data table in another scrollable container, so it can\n * be smoothly scrolled on smaller screen sizes and won't break the layout.\n *\n * @param el - Data table element\n *\n * @returns Data table component observable\n */\nexport function mountDataTable(\n el: HTMLElement\n): Observable> {\n replaceElement(el, sentinel)\n replaceElement(sentinel, renderTable(el))\n\n /* Create and return component */\n return of({ ref: el })\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { Observable, merge } from \"rxjs\"\n\nimport { Viewport, getElements } from \"~/browser\"\n\nimport { Component } from \"../../_\"\nimport { CodeBlock, mountCodeBlock } from \"../code\"\nimport { Details, mountDetails } from \"../details\"\nimport { DataTable, mountDataTable } from \"../table\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Content\n */\nexport type Content =\n | CodeBlock\n | DataTable\n | Details\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount options\n */\ninterface MountOptions {\n target$: Observable /* Location target observable */\n viewport$: Observable /* Viewport observable */\n print$: Observable /* Print mode observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount content\n *\n * This function mounts all components that are found in the content of the\n * actual article, including code blocks, data tables and details.\n *\n * @param el - Content element\n * @param options - Options\n *\n * @returns Content component observable\n */\nexport function mountContent(\n el: HTMLElement, { target$, viewport$, print$ }: MountOptions\n): Observable> {\n return merge(\n\n /* Code blocks */\n ...getElements(\"pre > code\", el)\n .map(child => mountCodeBlock(child, { viewport$ })),\n\n /* Data tables */\n ...getElements(\"table:not([class])\", el)\n .map(child => mountDataTable(child)),\n\n /* Details */\n ...getElements(\"details\", el)\n .map(child => mountDetails(child, { target$, print$ }))\n )\n}\n", "/*\n * Copyright (c) 2016-2021 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n Observable,\n Subject,\n animationFrameScheduler,\n merge,\n of\n} from \"rxjs\"\nimport {\n delay,\n map,\n observeOn,\n switchMap,\n tap\n} from \"rxjs/operators\"\n\nimport {\n resetDialogState,\n setDialogMessage,\n setDialogState\n} from \"~/actions\"\n\nimport { Component } from \"../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Dialog\n */\nexport interface Dialog {\n message: string /* Dialog message */\n open: boolean /* Dialog is visible */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch options\n */\ninterface WatchOptions {\n alert$: Subject /* Alert subject */\n}\n\n/**\n * Mount options\n */\ninterface MountOptions {\n alert$: Subject /* Alert subject */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch dialog\n *\n * @param _el - Dialog element\n * @param options - Options\n *\n * @returns Dialog observable\n */\nexport function watchDialog(\n _el: HTMLElement, { alert$ }: WatchOptions\n): Observable