scRNA-seq technologies have revolutionized transcriptomics, but the variety of available protocols and their distinct features can be confusing. This guide presents essential information on 30 protocols to help you select the right method for your needs and process the data correctly.
The protocols differ in several key aspects, including cell isolation techniques, transcript coverage, throughput, strand specificity, multiplexing capability, cost and technical complexity
STRT-seq | Smart-seq/C1 | CEL-seq | Quartz-seq | Smart-seq2 | SCRB-seq | MARS-seq | CEL-seq2 | CEL-seq2/C1 | MATQ-seq | Quartz-seq2 | Smart-seq3 | Smart-seq3xpress | FLASH-seq | VASA-plate | Drop-Seq | InDrop V1 | InDrop V2 | InDrop V3 | 10X Chromium V1 | 10X Chromium V2 | 10X Chromium V3 | VASA-drop | CytoSeq | Seq-well | Microwell-seq | sci-RNA-seq | sci-RNA-seq3 | Split-seq | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Released year | 2011 | 2012 | 2012 | 2013 | 2014 | 2014 | 2014 | 2016 | 2016 | 2017 | 2018 | 2020 | 2022 | 2022 | 2022 | 2015 | 2015 | 2015 | 2015 | 2017 | 2017 | 2017 | 2022 | 2015 | 2017 | 2018 | 2017 | 2019-2021 | 2018 |
Method-based | Plate-based | Plate-based | Plate-based/microfluidics | Plate-based | Plate-based | Plate-based | Plate-based | Plate-based/microfluidics | Fluidigm C1 | Plate-based | Plate-based | Plate-based | Plate-based | plate-based | Plate-based | Droplet-based | Droplet-based | Droplet-based | Droplet-based | Droplet-based | Droplet-based | Droplet-based | Droplet-based | Nanowell array | Nanowell array | Nanowell array | Combinatorial indexing-based (plate-based) | Combinatorial indexing-based (plate-based) | Combinatorial indexing-based |
Throughput | low-throughput | low-throughput | low-throughput | low-throughput | low-throughput | low-throughput | Automatic liquid handling high-throughput | medium throughput | medium throughput | medium throughput | medium throughput | low-throughput | low-throughput | low-throughput | low-throughput | high-throughput | high-throughput | high-throughput | high-throughput | high-throughput | high-throughput | high-throughput | high throughput | high-throughput | high-throughput | high-throughput | high-throughput | high-throughput | high-throughput |
Number of cells processed | 1-100 (96 cells) | 1-100 | 10-500 | 1-100 | <1,000 | <1,000 | 384-1,535 | 100-1,000 | 100-1,000 | 100-1,000 | up to 1,536 | < 1000 (384-well plates) | < 1,000 | < 1000 | <1,000 | 1,000-10,000 | 1,000-10,000 | 1,000-10,000 | 1000-10,000 | > 10,000 | > 10,000 | > 10,000 | > 10,000 | 10,000-100,000 | 10,000-100,000 | 5,000-10,000 | 1,000-10,000 | > 10,000 | > 10,000 |
Cost per cell for sequencing-ready libraries | $2 | NA | NA | $23 | $1.50 - $2.50 | $1.70 | $1.3 | $0.30-$0.50* | $0.70-$1.20* | $0.40 - $0.60 | $0.40 - $1.08 | $0.57 - $1.14 | $ 0.30 | $0,99 - $4,21 with UMI | $0.98 USD | $0.10 - $0.20 | $0.10 - $0.50 | $0.10 - $0.50 | $0.10 - $0.50 | $0.5 | $0.5 | $0.5 | $0.11 | < 1$ | 0.15 $ | 0.02$ | $0.03 - $0.20 | $0.01 | $0.01 |
Target RNA type | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyA+ and polyA- | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyA+ and polyA- | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyA+ and polyA- | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyadenylated RNA | polyA+ and polyA- |
Transcript coverage | 5' | Full-length | 3' | Full-length | Full-length | 3' | 3' | 3' | 3' | Full-length | 3' | Full-length | Full-length | Full-length | Full-length | 3' | 3' | 3' | 3' | 3' | 3' | 3' | Full-length | 3' | 3' | 3' | 3' | 3' | 3' |
UMI | no | no | no | no | no | yes (10bp) | yes (10bp) | yes (6bp) | yes (6bp) | yes | yes (8bp) | yes (8bp) | yes | yes if wanted | UFI (6pb) | yes (8pb) | yes (6pb) | yes (6pb) | yes (6pb) | yes (10pb) | yes (10bp) | yes (12bp) | UFI (6pb) | yes (8bp) | yes (8pb) | yes (6bp) | yes (8bp) | yes | yes (10bp) |
Barcode | yes (19bp) | no | yes (8bp) | no | no | yes (6bp) | yes (6bp) | yes (6bp) | yes (6bp) | no | yes (15bp for 1536 wells or 14bp for 384 wells ) | no | no | no | yes (8pb) | yes (12bp) | yes (19bp) | yes (19bp) | yes (16bp) | yes (14bp) | yes (16bp) | yes (16bp) | yes (2 x 8bp) | yes (8bp) | yes (12pb) | yes (18bp) | yes (10pb) | yes | yes (18bp) |
Strand specific | yes | no | yes | no | no | yes | yes | yes | yes | yes | yes | 5'UMI fragments stranded, internal fragments not stranded | 5'UMI fragments stranded, internal fragments not stranded | 5'UMI fragments stranded, internal fragments not stranded | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
Librare time preparation | 2 days | NA | 2-3 days (~30h) | NA | 10h | NA | NA | NA | NA | 10h | NA | 10.5 h | 5-6h | ~4.5 h (low amplification) - 7.2 h | NA | 12h | > 24h | > 24h | > 24h | < 24h | < 24h | < 24h | NA | NA | NA | NA | 2 days | 3 days (16h) | 2-3 days* |
av. number gene detect per cell (at sequencing saturation) | 1,000-8,000 | 6,000 - 8,000 | 4,000-6,000 | 3,000-7,000 | 6,500-10,000 | 5,000-9,000 | 500-5,000 | 5,000-7,000 | 6,000-9,000 | 8,000 -14'000 | 5,500-8,000 | 9,000-12,000 | 9,000-14,000 | 9,000-12,000 | 9,000-15,000 | 2000-6000 | 2,000 and 5,000 | 2,000 - 5,000 | 2,000 - 5,000 | 4000-7000 (before 500-1,500) | 4,000-7,000 (before 500-1,500) | 4,000-7,000 (before 500-1,500) | 9,000-15,000 | - | 6,000-10,000 | 6,500 | 3,000- 7,000* | 3,000- 7,000 | |
Conventional cell isolation/capture | Mouth pipette or FACS | Fluidigm C1 / FACS | Mouth pipette, FACS, microfluidics | Mouth pipette or FACS | FACS | FACS | FACS with automatic liquid handling | Mouth pipette, FACS, microfluidics | Fluidigm C1 | FACS | flow cytometry | FACS | FACS | FACS | FACS | Droplet | Droplet | Droplet | Droplet | Droplet | Droplet | Droplet | Microdroplets | not needed (dilution) | not needed (dilution) | not needed (dilution) | not needed (dilution) | not needed (dilution) | not needed (dilution) |
mRNA priming (1st strand synt.) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | random primers (GATdT/MALBAC primers) | poly(T) | poly(T) | poly(T) | Poly(T) | Poly(T)* | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | Poly(T)* | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) | poly(T) + random hexamer primers |
2nd strand synthesis | TSO | TSO | RNase H and DNA pol 1 (IVT) | 5' poly(A) tagging method: | TSO | TSO | RNase H and DNA pol 1 | RNase H and DNA pol 1 (IVT) | RNase H and DNA pol 1 (IVT) | ten cycles of annealing | PolyA tailing and primer ligation | TSO | TSO | TSO | RNase H and DNA pol 1 (IVT) | TSO | RNase H and DNA pol 1 | RNase H and DNA pol 1 | RNase H and DNA pol 1 | TSO | TSO | TSO | RNase H and DNA pol 1 (IVT) | NA | TSO | TSO | RNase H and DNA pol 1 | TSO | |
Full-length cDNA synthesis | no | yes | no | Yes | yes | yes | no | no | no | yes (but by pieces, as random priming) | yes in principle | yes | yes | yes | yes in pieces | yes | no | no | no | yes | yes | yes | yes in pieces | NA | yes | yes | no | no | yes |
Amplification method | PCR | PCR | IVT | PCR | PCR | PCR | IVT | IVT | IVT | PCR, Multiple annealing | PCR | PCR | PCR | semi-suppressive PCR | IVT | PCR | IVT | IVT | IVT | PCR | PCR | PCR | IVT | PCR (Pre-defined genes only) | PCR | PCR | PCR | PCR | PCR |
Pooling before library prep | yes | no | yes | no | no | yes | yes | yes | yes | no | yes | no | no | no | no (pooling just before IVT) | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
Fragmentation/tagmentation | fragmentation | tagmentation | fragmentation | fragmentation by Covaris (Ultrasound) | tagmentation | Tagmentation + 3' enrichment | RNA fragmentation | RNA fragmentation | RNA fragmentation | fragmentation by sonication | fragmentation by Ultrasound | tn5 tagmentation | tagmentation | tagmentation | RNA fragmentation | cDNA fragmentation | RNA fragmentation | RNA fragmentation | RNA fragmentation | cDNA fragmentation | Fragmentation + 3' enrichment | fragmentation + 3' enrichment | RNA fragmentation | NA | tagmentation + 3' enrichment | fragmentation + 3' enrichment | tagmentation + 3' enrichment | tagmentation | tagmentation |
In Kallisto | no | no | yes | no | yes | yes | no | yes | no | no | no | yes | yes | yes | yes | yes | yes | yes | yes | no | no | no | no | no | no | yes |
Protocol TO ADD: SUPeR-seq, SORT-seq STORM-seq, STRT-seq-C1, STRT-seq-2i, DNBelab C4, DroNC-seq
Cell isolation techniques form the basis of all scRNA-seq protocols and largely dictate the procedure's scalability and applicability. Plate-based methods, such as manual picking or fluorescence-activated cell sorting (FACS), offer precise control over cell selection (e.g suitable for targeting rare cell types) but are generally lower throughput. Microfluidic approaches, including for example droplet-based and microwell-based methods, can process a larger number of cells simultaneously and often incorporate barcoding strategies for sample multiplexing.
Transcript coverage refers to the portion of each RNA molecule that is sequenced. Some protocols, like SMART-seq2/3, capture full-length transcripts, providing comprehensive information about alternative splicing and isoform usage. In contrast, high-throughput methods such as 10X Genomics Chromium, Drop-seq, and inDrop focus on sequencing only the 3' or 5' ends of transcripts, trading transcript-level detail for increased cell throughput and lower cost.
Strand specificity refers to whether the protocol retains information about which DNA strand the RNA transcript was derived from. Strand specificity is essential for distinguishing between overlapping genes on opposite strands, identifying splicing events, detecting non-coding RNA transcripts, and investigating antisense transcription. ScRNA-seq protocols that specifically sequence the 3' ends of RNA molecules tend to be stranded, while full-length protocols often do not preserve strand information.
Amplification is an essential step in scRNA-seq protocols, increasing the limited cDNA from each cell to levels appropriate for sequencing. Two primary methods are utilized: PCR (Polymerase Chain Reaction), an exponential amplification method, and IVT (In Vitro Transcription), a linear amplification method. While PCR is faster, it can introduce biases due to uneven amplification efficiency, although these can be mitigated with unique molecular identifiers (UMIs). IVT typically introduces fewer biases due to its linear amplification nature, providing a more accurate representation of the original transcript abundance. However, IVT is more time-consuming than PCR.
The type of RNA targeted for sequencing is another crucial factor that distinguishes scRNA-seq protocols. Most currently available methods focus on mRNA due to its ease of isolation and compatibility with multiplexing strategies. This is achieved by using Poly-T primers during reverse transcription, which selectively target the poly-A tail of mRNA. If a broader investigation of RNA species, including non-coding RNAs, is desired, different approaches can be employed. One option is to use random primers. Another method involves RNA fragmentationin the first step, followed by end repair and poly(A) tailing, enabling cDNA synthesis from barcoded oligo-dT probes.
The cost of scRNA-seq varies significantly based on the chosen protocol. High-throughput methods, such as droplet-based techniques, can process many cells simultaneously, significantly reducing the cost per cell despite the initial expense of specialized equipment and reagents. On the other hand, plate-based methods, like manual picking or fluorescence-activated cell sorting (FACS), while having lower equipment costs, are more labor-intensive and often have higher costs per cell due to the time and resources needed to process individual cells. Methods targeting total RNA or aiming for full-length transcript coverage can also be more expensive due to additional reagents and steps required. Furthermore, downstream data analysis costs need to be considered as high-throughput methods typically generate large amounts of data, requiring substantial computational resources to process.
Multiplexing, a process that allows for the simultaneous preparation of multiple samples, has become an essential feature of many high-throughput scRNA-seq protocols. Multiplexing is achieved through the use of barcodes and unique molecular identifiers (UMIs). Barcodes are sequences unique to each cell, while UMIs are unique tags added to each transcript, allowing for the differentiation and quantification of individual mRNA molecules.
While multiplexing greatly enhances the throughput and efficiency of scRNA-seq, it also introduces additional data processing steps. Correcting barcodes becomes necessary to account for potential sequencing errors, ensuring accurate cell assignment. Demultiplexing, the process of assigning reads back to their respective cells based on their barcodes, requires protocol-specific handling of the barcode structures. Additionally, UMI deduplication is performed to account for PCR amplification bias and also requires protocol-specific handling of the UMI structures.
Furthermore, in droplet-based methods, it's crucial to filter out doublets (droplets containing more than one cell), multiplets (more than two cells), empty droplets and damaged cells to ensure accurate downstream analysis. Each of these steps increases the complexity of the data processing pipeline and should be carefully considered when planning a scRNA-seq experiment.
Normalization is a crucial step in single-cell RNA sequencing (scRNA-seq) data analysis that aims to remove technical biases and enable meaningful comparisons of gene expression across cells. The choice of the most suitable normalization method often depends on the specific scRNA-seq protocol employed. For protocols capturing the entire transcript, normalization approaches based on total RNA molecule counts, such as library size normalization or transcripts per million (TPM), are commonly used. These methods account for sequencing depth differences. In 3' end sequencing protocols, unique molecular identifier (UMI) counts are often employed for normalization to correct for amplification biases. Additionally, normalization methods that consider capture efficiency variations can be applied, such as spike-in normalization using synthetic RNA molecules or statistical models incorporating factors like GC content and transcript length. Choosing an appropriate normalization method ensures accurate quantification and reliable interpretation of gene expression patterns in scRNA-seq data.
Still need to talk about dropout and sensitivity