diff --git a/data-representation.html b/data-representation.html index 098d060..3939943 100644 --- a/data-representation.html +++ b/data-representation.html @@ -112,7 +112,7 @@ } } - +
-
+

2  Hi-C data structures in R

@@ -362,11 +362,13 @@

Directly jump to the last section of this chapter to get a visual representation of these data structures.

-

+

2.1 GRanges class

GRanges is a shorthand for GenomicRanges, a core class in Bioconductor. This class is primarily used to describe genomic ranges of any nature, e.g.  sets of promoters, SNPs, chromatin loop anchors, ….
The data structure has been published in the seminal 2015 publication by the Bioconductor team (Huber et al. (2015)).

-

+
+Huber, W., Carey, V. J., Gentleman, R., Anders, S., Carlson, M., Carvalho, B. S., Bravo, H. C., Davis, S., Gatto, L., Girke, T., Gottardo, R., Hahne, F., Hansen, K. D., Irizarry, R. A., Lawrence, M., Love, M. I., MacDonald, J., Obenchain, V., Oleś, A. K., … Morgan, M. (2015). Orchestrating high-throughput genomic analysis with bioconductor. Nature Methods, 12(2), 115–121. https://doi.org/10.1038/nmeth.3252 +

2.1.1 GRanges fundamentals

The easiest way to generate a GRanges object is to coerce it from a vector of genomic coordinates in the UCSC format (e.g. "chr2:2004-4853"):

@@ -1084,10 +1086,12 @@

peaks$distance_to_nearest_TSS <- mcols(distanceToNearest(peaks, TSSs))$distance

Note how close from a TSS the 8th peak was. It could be worth considering this as an overlap!

-

+

2.2 GInteractions class

GRanges describe genomic ranges and hence are of general use to study 1D genome organization. To study chromatin interactions, we need a way to link pairs of GRanges. This is exactly what the GInteractions class does. This data structure is defined in the InteractionSet package and has been published in the 2016 paper by Lun et al. (Lun et al. (2016)).

-

+
+Lun, A. T. L., Perry, M., & Ing-Simmons, E. (2016). Infrastructure for genomic interactions: Bioconductor classes for hi-c, ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 +

2.2.1 Building a GInteractions object from scratch

Let’s first define two parallel GRanges objects (i.e. two GRanges of same length). Each GRanges will contain 5 ranges.

@@ -1559,7 +1563,7 @@

coolf
 ##                                                   EH7702 
-##  "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752"
+## "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752"

Similarly, example files are available for other file formats:

@@ -1599,7 +1603,7 @@

# ----- This creates a connection to a `.(m)cool` file (path stored in `coolf`) CoolFile(coolf) ## CoolFile object -## .mcool file: /github/home/.cache/R/ExperimentHub/1a92248c093f_7752 +## .mcool file: /github/home/.cache/R/ExperimentHub/1a9466054db5_7752 ## resolution: 1000 ## pairs file: ## metadata(0): @@ -1607,7 +1611,7 @@

# ----- This creates a connection to a `.hic` file (path stored in `hicf`) HicFile(hicf) ## HicFile object -## .hic file: /github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836 +## .hic file: /github/home/.cache/R/ExperimentHub/1a94322aa2b7_7836 ## resolution: 1000 ## pairs file: ## metadata(0): @@ -1616,8 +1620,8 @@

HicproFile(hicpromatrixf, hicproregionsf) ## HicproFile object ## HiC-Pro files: -## $ matrix: /github/home/.cache/R/ExperimentHub/1a925372027_7837 -## $ regions: /github/home/.cache/R/ExperimentHub/1a92600d50bf_7838 +## $ matrix: /github/home/.cache/R/ExperimentHub/1a941672e405_7837 +## $ regions: /github/home/.cache/R/ExperimentHub/1a9453166d4f_7838 ## resolution: 1000 ## pairs file: ## metadata(0): @@ -1625,7 +1629,7 @@

# ----- This creates a connection to a pairs file PairsFile(pairsf) ## PairsFile object -## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753

+## resource: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753

2.3.3 ContactFile slots

@@ -1641,7 +1645,7 @@

cf <- CoolFile(coolf)
 cf
 ##  CoolFile object
-##  .mcool file: /github/home/.cache/R/ExperimentHub/1a92248c093f_7752 
+##  .mcool file: /github/home/.cache/R/ExperimentHub/1a9466054db5_7752 
 ##  resolution: 1000 
 ##  pairs file: 
 ##  metadata(0):
@@ -1739,7 +1743,7 @@ 

hic ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -1771,7 +1775,7 @@

These pieces of information are called slots. They can be directly accessed using getter functions, bearing the same name than the slot.

fileName(hic)
-##  [1] "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752"
+##  [1] "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752"
 
 focus(hic)
 ##  NULL
@@ -1825,7 +1829,7 @@ 

hic ## `HiCExperiment` object with 13,681,280 contacts over 12,165 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836" +## fileName: "/github/home/.cache/R/ExperimentHub/1a94322aa2b7_7836" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -2146,14 +2150,14 @@

yeast_hic
 ##  `HiCExperiment` object with 8,757,906 contacts over 763 regions 
 ##  -------
-##  fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" 
+##  fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" 
 ##  focus: "whole genome" 
 ##  resolutions(5): 1000 2000 4000 8000 16000
 ##  active resolution: 16000 
 ##  interactions: 267709 
 ##  scores(2): count balanced 
 ##  topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) 
-##  pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 
+##  pairsFile: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 
 ##  metadata(3): ID org date

@@ -2380,8 +2384,8 @@

pairsFile(yeast_hic) <- pairsf
 
 pairsFile(yeast_hic)
-##                                                  EH7703 
-##  "/github/home/.cache/R/ExperimentHub/1a92835ced9_7753"
+##                                                   EH7703 
+##  "/github/home/.cache/R/ExperimentHub/1a9456d59216_7753"
 
 readLines(pairsFile(yeast_hic), 25)
 ##   [1] "## pairs format v1.0"                                                              "#sorted: chr1-pos1-chr2-pos2"                                                      "#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2"                  "#chromsize: I 230218"                                                              "#chromsize: II 813184"                                                             "#chromsize: III 316620"                                                            "#chromsize: IV 1531933"                                                            "#chromsize: V 576874"                                                              "#chromsize: VI 270161"                                                             "#chromsize: VII 1090940"                                                           "#chromsize: VIII 562643"                                                           "#chromsize: IX 439888"                                                             "#chromsize: X 745751"                                                              "#chromsize: XI 666816"                                                             "#chromsize: XII 1078177"                                                           "#chromsize: XIII 924431"                                                           "#chromsize: XIV 784333"                                                            "#chromsize: XV 1091291"                                                            "#chromsize: XVI 948066"                                                            "#chromsize: Mito 85779"                                                            "NS500150:527:HHGYNBGXF:3:21611:19085:3986\tII\t105\tII\t48548\t+\t-\t1358\t1681"   "NS500150:527:HHGYNBGXF:4:13604:19734:2406\tII\t113\tII\t45003\t-\t+\t1358\t1658"   "NS500150:527:HHGYNBGXF:2:11108:25178:11036\tII\t119\tII\t687251\t-\t+\t1358\t5550" "NS500150:527:HHGYNBGXF:1:22301:8468:1586\tII\t160\tII\t26124\t+\t-\t1358\t1510"    "NS500150:527:HHGYNBGXF:4:23606:24037:2076\tII\t169\tII\t39052\t+\t+\t1358\t1613"
@@ -2418,14 +2422,7 @@

References

- +

- +
-
+

8  Data gateways: accessing public Hi-C data portals

@@ -314,7 +314,7 @@

The Hi-C experimental approach has gained significant traction across multiple fields related to genome biology, and several consortia developed large-scale programs based on this technique. The fourDNData and DNAZooData R packages were designed to accelerate the investigation of chromatin structure using these public resources.

-

+

8.1 4DN data portal

The 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogenously processed, yielding more than 350 sets of processed files.

fourDNData (read 4DN-Data) is a package giving programmatic access to these uniformly processed Hi-C contact files.

@@ -330,7 +330,7 @@

## 4DNES18BMU79 insulation 7.18 mouse in situ Hi-C DpnII Hi-C on Mouse Olfactory System cells Mature olfactory sensory neurons with conditional Ldb1 knockout olfactory receptor cell primary cell Monahan K et al. (2019) https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/d1f4beb9-701f-4188-abe2-6271fe658770/4DNFIXKKNMS7.bw ## 4DNES18BMU79 compartments 0.18 mouse in situ Hi-C DpnII Hi-C on Mouse Olfactory System cells Mature olfactory sensory neurons with conditional Ldb1 knockout olfactory receptor cell primary cell Monahan K et al. (2019) https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/3d429647-51c8-4e3a-a18b-eec0b1480905/4DNFIN13N8C1.bw -

+

8.1.1 Querying individual files

The fourDNData() function can be used to directly fetch specific files from the 4DN data portal:

@@ -413,7 +413,9 @@

  • type = 'insulation' will fetch a .bigwig track file precomputed by the 4DN consortium. This track corresponds to the genome-wide insulation score computed by cooltools as described in Crane et al. (2015). To know more about this, read the excerpt from 4DN data portal. Once fetched from the 4DN data portal, the local file can be imported in R using the import function, which will generate a RleList object.
  • -
    +
    +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +
    fourDNData(experimentSetAccession = '4DNES25ABNZ1', type = 'insulation') |> 
         import(as = 'Rle')
     ##  |===================================|  100%
    @@ -603,11 +605,7 @@ 

    References

    - +

    - +
    -
    +

    7  Finding topological features in Hi-C

    @@ -313,7 +312,8 @@

    -
    +

    reference-section-title: References

    +
    @@ -331,13 +331,15 @@

    -

    +

    7.1 Chromosome compartments

    Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment).

    -

    +

    7.1.1 Importing Hi-C data

    To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp.

    -
    +
    +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 +
    library(HiCExperiment)
     library(OHCA)
     cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool')
    @@ -487,7 +489,7 @@ 

    Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot.

    -

    +

    7.2 Topological domains

    Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries.

    @@ -495,10 +497,20 @@

    They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)).

    -

    +
    +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 +
    +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 +
    +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 +

    7.2.1 Computing diamond insulation score

    Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare.

    -

    HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution.

    +
    +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 +
    +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +

    HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution.

    # - Compute insulation score
     bpparam <- SerialParam(progressbar = FALSE)
    @@ -617,13 +629,15 @@ 

    Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds.

    -

    +

    7.3 Chromatin loops

    -

    +

    7.3.1 chromosight

    Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function.

    -

    +
    +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 +

    7.3.1.1 Identifying loops

    hic <- HiCool::getLoops(microC, resolution = 5000)
    @@ -773,45 +787,19 @@ 

    )

    -

    +

    7.3.2 Other R packages

    A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications.

    -

    References

    -

    - +
    -
    +

    4  Hi-C data visualization

    @@ -356,7 +356,7 @@

    hic
     ##  `HiCExperiment` object with 303,545 contacts over 289 regions 
     ##  -------
    -##  fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" 
    +##  fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" 
     ##  focus: "V" 
     ##  resolutions(5): 1000 2000 4000 8000 16000
     ##  active resolution: 2000 
    @@ -518,13 +518,15 @@ 

    -

    +

    4.3 Advanced visualization

    -

    +

    4.3.1 Overlaying topological features

    Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap.

    To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from.

    -
    +
    +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 +
    library(rtracklayer)
     library(InteractionSet)
     loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> 
    @@ -596,7 +598,7 @@ 

    aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@

    References

    - +
    - +