diff --git a/data-representation.html b/data-representation.html index 098d060..3939943 100644 --- a/data-representation.html +++ b/data-representation.html @@ -112,7 +112,7 @@ } } -
Edit this page
Report an issue
Directly jump to the last section of this chapter to get a visual representation of these data structures.
GRanges
GRanges is a shorthand for GenomicRanges, a core class in Bioconductor. This class is primarily used to describe genomic ranges of any nature, e.g. sets of promoters, SNPs, chromatin loop anchors, …. The data structure has been published in the seminal 2015 publication by the Bioconductor team (Huber et al. (2015)).
GenomicRanges
Bioconductor
The easiest way to generate a GRanges object is to coerce it from a vector of genomic coordinates in the UCSC format (e.g. "chr2:2004-4853"):
"chr2:2004-4853"
Note how close from a TSS the 8th peak was. It could be worth considering this as an overlap!
GInteractions
GRanges describe genomic ranges and hence are of general use to study 1D genome organization. To study chromatin interactions, we need a way to link pairs of GRanges. This is exactly what the GInteractions class does. This data structure is defined in the InteractionSet package and has been published in the 2016 paper by Lun et al. (Lun et al. (2016)).
InteractionSet
Lun et al.
Let’s first define two parallel GRanges objects (i.e. two GRanges of same length). Each GRanges will contain 5 ranges.
coolf ## EH7702 -## "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752"
Similarly, example files are available for other file formats:
ContactFile
cf <- CoolFile(coolf) cf ## CoolFile object -## .mcool file: /github/home/.cache/R/ExperimentHub/1a92248c093f_7752 +## .mcool file: /github/home/.cache/R/ExperimentHub/1a9466054db5_7752 ## resolution: 1000 ## pairs file: ## metadata(0): @@ -1739,7 +1743,7 @@ hic ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -1771,7 +1775,7 @@ These pieces of information are called slots. They can be directly accessed using getter functions, bearing the same name than the slot. fileName(hic) -## [1] "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## [1] "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" focus(hic) ## NULL @@ -1825,7 +1829,7 @@ hic ## `HiCExperiment` object with 13,681,280 contacts over 12,165 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836" +## fileName: "/github/home/.cache/R/ExperimentHub/1a94322aa2b7_7836" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -2146,14 +2150,14 @@ yeast_hic ## `HiCExperiment` object with 8,757,906 contacts over 763 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 16000 ## interactions: 267709 ## scores(2): count balanced ## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) -## pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 +## pairsFile: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 ## metadata(3): ID org date @@ -2380,8 +2384,8 @@ pairsFile(yeast_hic) <- pairsf pairsFile(yeast_hic) -## EH7703 -## "/github/home/.cache/R/ExperimentHub/1a92835ced9_7753" +## EH7703 +## "/github/home/.cache/R/ExperimentHub/1a9456d59216_7753" readLines(pairsFile(yeast_hic), 25) ## [1] "## pairs format v1.0" "#sorted: chr1-pos1-chr2-pos2" "#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2" "#chromsize: I 230218" "#chromsize: II 813184" "#chromsize: III 316620" "#chromsize: IV 1531933" "#chromsize: V 576874" "#chromsize: VI 270161" "#chromsize: VII 1090940" "#chromsize: VIII 562643" "#chromsize: IX 439888" "#chromsize: X 745751" "#chromsize: XI 666816" "#chromsize: XII 1078177" "#chromsize: XIII 924431" "#chromsize: XIV 784333" "#chromsize: XV 1091291" "#chromsize: XVI 948066" "#chromsize: Mito 85779" "NS500150:527:HHGYNBGXF:3:21611:19085:3986\tII\t105\tII\t48548\t+\t-\t1358\t1681" "NS500150:527:HHGYNBGXF:4:13604:19734:2406\tII\t113\tII\t45003\t-\t+\t1358\t1658" "NS500150:527:HHGYNBGXF:2:11108:25178:11036\tII\t119\tII\t687251\t-\t+\t1358\t5550" "NS500150:527:HHGYNBGXF:1:22301:8468:1586\tII\t160\tII\t26124\t+\t-\t1358\t1510" "NS500150:527:HHGYNBGXF:4:23606:24037:2076\tII\t169\tII\t39052\t+\t+\t1358\t1613" @@ -2418,14 +2422,7 @@ References - - -Huber, W., Carey, V. J., Gentleman, R., Anders, S., Carlson, M., Carvalho, B. S., Bravo, H. C., Davis, S., Gatto, L., Girke, T., Gottardo, R., Hahne, F., Hansen, K. D., Irizarry, R. A., Lawrence, M., Love, M. I., MacDonald, J., Obenchain, V., Oleś, A. K., … Morgan, M. (2015). Orchestrating high-throughput genomic analysis with bioconductor. Nature Methods, 12(2), 115–121. https://doi.org/10.1038/nmeth.3252 - - -Lun, A. T. L., Perry, M., & Ing-Simmons, E. (2016). Infrastructure for genomic interactions: Bioconductor classes for hi-c, ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 - - + - + @@ -280,7 +280,7 @@ Edit this pageReport an issue - + 8 Data gateways: accessing public Hi-C data portals @@ -314,7 +314,7 @@ The Hi-C experimental approach has gained significant traction across multiple fields related to genome biology, and several consortia developed large-scale programs based on this technique. The fourDNData and DNAZooData R packages were designed to accelerate the investigation of chromatin structure using these public resources. - + 8.1 4DN data portal The 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogenously processed, yielding more than 350 sets of processed files. fourDNData (read 4DN-Data) is a package giving programmatic access to these uniformly processed Hi-C contact files. @@ -330,7 +330,7 @@ ## 4DNES18BMU79 insulation 7.18 mouse in situ Hi-C DpnII Hi-C on Mouse Olfactory System cells Mature olfactory sensory neurons with conditional Ldb1 knockout olfactory receptor cell primary cell Monahan K et al. (2019) https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/d1f4beb9-701f-4188-abe2-6271fe658770/4DNFIXKKNMS7.bw ## 4DNES18BMU79 compartments 0.18 mouse in situ Hi-C DpnII Hi-C on Mouse Olfactory System cells Mature olfactory sensory neurons with conditional Ldb1 knockout olfactory receptor cell primary cell Monahan K et al. (2019) https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/3d429647-51c8-4e3a-a18b-eec0b1480905/4DNFIN13N8C1.bw
These pieces of information are called slots. They can be directly accessed using getter functions, bearing the same name than the slot.
slots
getter
fileName(hic) -## [1] "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## [1] "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" focus(hic) ## NULL @@ -1825,7 +1829,7 @@ hic ## `HiCExperiment` object with 13,681,280 contacts over 12,165 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92259b7f1f_7836" +## fileName: "/github/home/.cache/R/ExperimentHub/1a94322aa2b7_7836" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -2146,14 +2150,14 @@ yeast_hic ## `HiCExperiment` object with 8,757,906 contacts over 763 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 16000 ## interactions: 267709 ## scores(2): count balanced ## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) -## pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 +## pairsFile: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 ## metadata(3): ID org date
yeast_hic ## `HiCExperiment` object with 8,757,906 contacts over 763 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 16000 ## interactions: 267709 ## scores(2): count balanced ## topologicalFeatures: compartments(0) borders(0) loops(0) viewpoints(0) centromeres(16) -## pairsFile: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 +## pairsFile: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 ## metadata(3): ID org date
pairsFile(yeast_hic) <- pairsf pairsFile(yeast_hic) -## EH7703 -## "/github/home/.cache/R/ExperimentHub/1a92835ced9_7753" +## EH7703 +## "/github/home/.cache/R/ExperimentHub/1a9456d59216_7753" readLines(pairsFile(yeast_hic), 25) ## [1] "## pairs format v1.0" "#sorted: chr1-pos1-chr2-pos2" "#columns: readID chr1 pos1 chr2 pos2 strand1 strand2 frag1 frag2" "#chromsize: I 230218" "#chromsize: II 813184" "#chromsize: III 316620" "#chromsize: IV 1531933" "#chromsize: V 576874" "#chromsize: VI 270161" "#chromsize: VII 1090940" "#chromsize: VIII 562643" "#chromsize: IX 439888" "#chromsize: X 745751" "#chromsize: XI 666816" "#chromsize: XII 1078177" "#chromsize: XIII 924431" "#chromsize: XIV 784333" "#chromsize: XV 1091291" "#chromsize: XVI 948066" "#chromsize: Mito 85779" "NS500150:527:HHGYNBGXF:3:21611:19085:3986\tII\t105\tII\t48548\t+\t-\t1358\t1681" "NS500150:527:HHGYNBGXF:4:13604:19734:2406\tII\t113\tII\t45003\t-\t+\t1358\t1658" "NS500150:527:HHGYNBGXF:2:11108:25178:11036\tII\t119\tII\t687251\t-\t+\t1358\t5550" "NS500150:527:HHGYNBGXF:1:22301:8468:1586\tII\t160\tII\t26124\t+\t-\t1358\t1510" "NS500150:527:HHGYNBGXF:4:23606:24037:2076\tII\t169\tII\t39052\t+\t+\t1358\t1613"
The Hi-C experimental approach has gained significant traction across multiple fields related to genome biology, and several consortia developed large-scale programs based on this technique. The fourDNData and DNAZooData R packages were designed to accelerate the investigation of chromatin structure using these public resources.
fourDNData
DNAZooData
The 4D Nucleome Data Coordination and Integration Center (DCIC) has developed and actively maintains a data portal providing public access to a wealth of resources to investigate 3D chromatin architecture. Notably, 3D chromatin conformation libraries relying on different technologies (“in situ” or “dilution” Hi-C, Capture Hi-C, Micro-C, DNase Hi-C, …), generated by 50+ collaborating labs, were homogenously processed, yielding more than 350 sets of processed files.
fourDNData (read 4DN-Data) is a package giving programmatic access to these uniformly processed Hi-C contact files.
The fourDNData() function can be used to directly fetch specific files from the 4DN data portal:
fourDNData()
type = 'insulation'
.bigwig
cooltools
R
import
RleList
fourDNData(experimentSetAccession = '4DNES25ABNZ1', type = 'insulation') |> import(as = 'Rle') ## |===================================| 100% @@ -603,11 +605,7 @@ References - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - + - + @@ -372,7 +372,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,7 +400,7 @@ pf ## PairsFile object -## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 +## resource: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 If needed, PairsFile connections can be imported directly into a GInteractions object with import(). @@ -427,7 +427,7 @@ library(HiContacts) ps <- distanceLaw(pf, by_chr = TRUE) -## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 in memory. This may take a while... ps ## # A tibble: 115 × 6 ## chr binned_distance p norm_p norm_p_unity slope @@ -469,7 +469,7 @@ eco1_ps <- distanceLaw(eco1_pf, by_chr = TRUE) -## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/21f428f3345_7755 in memory. This may take a while... eco1_ps ## # A tibble: 115 × 6 ## chr binned_distance p norm_p norm_p_unity slope @@ -614,10 +614,12 @@ 6.4 Scalograms Scalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes. To generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile. - + +Lioy, V. S., Cournac, A., Marbouty, M., Duigou, S., Mozziconacci, J., Espéli, O., Boccard, F., & Koszul, R. (2018). Multiscale structuring of the e. Coli chromosome by nucleoid-associated and condensin proteins. Cell, 172(4), 771–783.e18. https://doi.org/10.1016/j.cell.2017.12.027 + pairsFile(hic) <- pairsf scalo <- scalogram(hic) -## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 in memory. This may take a while... plotScalogram(scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5)) @@ -641,7 +643,7 @@ ## loading from cache pairsFile(eco1_hic) <- eco1_pairsf eco1_scalo <- scalogram(eco1_hic) -## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/21f428f3345_7755 in memory. This may take a while... merged_scalo <- rbind( scalo |> mutate(sample = 'WT'), eco1_scalo |> mutate(sample = 'eco1') @@ -660,11 +662,7 @@ References - - -Lioy, V. S., Cournac, A., Marbouty, M., Duigou, S., Mozziconacci, J., Espéli, O., Boccard, F., & Koszul, R. (2018). Multiscale structuring of the e. Coli chromosome by nucleoid-associated and condensin proteins. Cell, 172(4), 771–783.e18. https://doi.org/10.1016/j.cell.2017.12.027 - - + - + @@ -268,17 +268,18 @@ Table of contents -9.1 HiCrep - 9.2 multiHiCcompare - 9.3 TopDom - 9.4 GOTHiC +9.1 diffHic + 9.2 HiCrep + 9.3 multiHiCcompare + 9.4 TopDom + 9.5 GOTHiC References Session info References Edit this pageReport an issue - + 9 Interoperability: using HiCExperiment with other R packages @@ -304,8 +305,9 @@ -This notebook illustrates how to use a range of popular Hi-C—related R packages with HiCExperiment objects. Conversion to the following packages is illustrated here: +This notebook illustrates how to use a range of popular Hi-C—related R packages with HiCExperiment objects. Conversion to the data structures supported by the following packages is illustrated here: +diffHic hicrep multiHiCcompare TopDom @@ -315,10 +317,146 @@ - -9.1 HiCrep + +9.1 diffHic +diffHic is the first R package dedicated to Hi-C processing and analysis (Lun & Smyth (2015)). It is packed with useful functions to generate a contact matrix from read pairs and to perform downstream investigation, including normalization, 2D “peak” (i.e. loops) finding and aggregation, differential interaction between samples, etc. It works seamlessly with the InteractionSet class of object, which can be easily obtained from a HiCExperiment object. + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 +To do so, we first need to extract GInteractions from one or several HiCExperiment objects and create a single InteractionSet object. + +library(InteractionSet) +library(GenomicRanges) +library(HiCExperiment) +library(HiContactsData) + +# ---- This downloads an example `.mcool` file and caches it locally +coolf <- HiContactsData('yeast_wt', 'mcool') +## see ?HiContactsData and browseVignettes('HiContactsData') for documentation +## loading from cache +cool <- import(coolf, format = 'cool') +gi <- cool |> + interactions() |> + as("ReverseStrictGInteractions") +iset <- InteractionSet( + assays = list( + counts = matrix(gi$count, ncol = 1), + balanced = matrix(gi$balanced, ncol = 1) + ), + interactions = gi, + colData = data.frame(lib = c("WT"), totals = sum(gi$count)) +) + +From there, we can filter interactions to only retain those with significant enrichment over background. + +library(diffHic) +set.seed(1234) + +# --- Filter to find aggregated interactions +enrichments <- enrichedPairs(iset) +filter <- filterPeaks(enrichments, min.enrich = log2(1.2), min.diag = 5) +filtered_iset <- iset[filter] +filtered_iset +## class: InteractionSet +## dim: 41872 1 +## metadata(0): +## assays(2): counts balanced +## rownames: NULL +## rowData names(4): bin_id1 bin_id2 count balanced +## colnames: NULL +## colData names(2): lib totals +## type: ReverseStrictGInteractions +## regions: 12079 + +# --- Visualize filtered interactions +library(plyinteractions) +library(HiContacts) +## Registered S3 methods overwritten by 'readr': +## method from +## as.data.frame.spec_tbl_df vroom +## as_tibble.spec_tbl_df vroom +## format.col_spec vroom +## print.col_spec vroom +## print.collector vroom +## print.date_names vroom +## print.locale vroom +## str.col_spec vroom +interactions(filtered_iset) |> + filter(seqnames2 == 'II', seqnames1 == seqnames2) |> + plotMatrix(use.scores = 'count') + + + + + + + +Next, we can cluster filtered interactions that are next to each other. + +# --- Cluster interactions to find loops +clustered_iset <- clusterPairs(filtered_iset, tol = 5000) +clustered_iset$interactions +## ReverseStrictGInteractions object with 1644 interactions and 0 metadata columns: +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> +## [1] I 15001-149000 * --- I 1-122000 * +## [2] I 133001-148000 * --- I 127001-139000 * +## [3] I 154001-160000 * --- I 128001-149000 * +## [4] I 168001-173000 * --- I 138001-148000 * +## [5] I 184001-196000 * --- I 15001-23000 * +## ... ... ... ... ... ... ... ... +## [1640] XVI 897001-898000 * --- XVI 831001-832000 * +## [1641] XVI 907001-910000 * --- XVI 840001-843000 * +## [1642] XVI 926001-934000 * --- XVI 872001-878000 * +## [1643] XVI 933001-934000 * --- XVI 858001-859000 * +## [1644] XVI 933001-942000 * --- XVI 928001-934000 * +## ------- +## regions: 2822 ranges and 0 metadata columns +## seqinfo: 16 sequences from an unspecified genome + +# --- Visualize clustered interactions +interactions(filtered_iset) |> + mutate(cluster = clustered_iset$indices[[1]]) |> + filter(seqnames2 == 'II', seqnames1 == seqnames2) |> + plotMatrix(use.scores = 'cluster') + + + + + + + +Finally, we can visualize identified individual interaction clusters identified with diffHic using HiContacts. + +# --- Plot matrix at a clustered loops +cgi <- clustered_iset$interactions[554] +seqn <- seqnames(anchors(cgi, type="second")) +start <- start(anchors(cgi, type="second")) - 50000 +end <- end(anchors(cgi, type="first")) + 50000 +interactions_peak <- GRanges(seqn, IRanges(start, end)) +p <- plotMatrix(cool[interactions_peak]) + +library(ggplot2) +an <- anchors(cgi) +p + geom_rect( + data = data.frame(xmin = start(an[[2]]), xmax = end(an[[2]]), ymin = start(an[[1]]), ymax = end(an[[1]])), + aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), + inherit.aes = FALSE, + fill = NA, + colour = 'cyan' +) + + + + + + + + +9.2 HiCrep hicrep is a popular package to compute stratum-adjusted correlations between Hi-C datasets (Yang et al. (2017)). “Stratum” refers to the distance from the main diagonal: with increase distance from the main diagonal, interactions of the DNA polymer are bound to decrease. hicrep computes a “per-stratum” correlation score and computes a weighted average correlation for entire chromosomes. - + +Yang, T., Zhang, F., Yardımcı, G. G., Song, F., Hardison, R. C., Noble, W. S., Yue, F., & Li, Q. (2017). HiCRep: Assessing the reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 + @@ -329,31 +467,26 @@ hicrep package has been available from Bioconductor for many years but has been withdrawn from their repositories at some point. You can always install hicrep directly from its GitHub repository as follows: - -remotes::install_github('TaoYang-dev/hicrep') + +remotes::install_github('TaoYang-dev/hicrep') In order to use hicrep, we first need to create two HiCExperiment objects. - -library(InteractionSet) -library(HiCExperiment) -library(HiContactsData) - -# ---- This downloads example `.mcool` and `.pairs` files and caches them locally -coolf_wt <- HiContactsData('yeast_wt', 'mcool') + +# ---- This downloads example `.mcool` files and caches them locally coolf_eco1 <- HiContactsData('yeast_eco1', 'mcool') - -hic_wt <- import(coolf_wt, format = 'cool') + +hic_wt <- import(coolf_wt, format = 'cool') hic_eco1 <- import(coolf_eco1, format = 'cool') We can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format. - -hic_wt + +hic_wt ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -401,12 +534,14 @@ ## [,1] ## [1,] 0.9334303 - -9.2 multiHiCcompare + +9.3 multiHiCcompare The multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count). Manipulate a HiCExperiment object to coerce it into such structure is straightforward. - -library(dplyr) + +Stansfield, J. C., Cresswell, K. G., & Dozmorov, M. G. (2019). multiHiCcompare: Joint normalization and comparative analysis of complex hi-c experiments. Bioinformatics, 35(17), 2916–2923. https://doi.org/10.1093/bioinformatics/btz048 + +library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13 Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results(): - -DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,7 +400,7 @@ pf ## PairsFile object -## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 +## resource: /github/home/.cache/R/ExperimentHub/1a9456d59216_7753
pf ## PairsFile object -## resource: /github/home/.cache/R/ExperimentHub/1a92835ced9_7753
If needed, PairsFile connections can be imported directly into a GInteractions object with import().
PairsFile
import()
library(HiContacts) ps <- distanceLaw(pf, by_chr = TRUE) -## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 in memory. This may take a while... ps ## # A tibble: 115 × 6 ## chr binned_distance p norm_p norm_p_unity slope @@ -469,7 +469,7 @@
eco1_ps <- distanceLaw(eco1_pf, by_chr = TRUE) -## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/21f428f3345_7755 in memory. This may take a while... eco1_ps ## # A tibble: 115 × 6 ## chr binned_distance p norm_p norm_p_unity slope @@ -614,10 +614,12 @@ 6.4 Scalograms Scalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes. To generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile. - + +Lioy, V. S., Cournac, A., Marbouty, M., Duigou, S., Mozziconacci, J., Espéli, O., Boccard, F., & Koszul, R. (2018). Multiscale structuring of the e. Coli chromosome by nucleoid-associated and condensin proteins. Cell, 172(4), 771–783.e18. https://doi.org/10.1016/j.cell.2017.12.027 + pairsFile(hic) <- pairsf scalo <- scalogram(hic) -## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 in memory. This may take a while... plotScalogram(scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5)) @@ -641,7 +643,7 @@ ## loading from cache pairsFile(eco1_hic) <- eco1_pairsf eco1_scalo <- scalogram(eco1_hic) -## Importing pairs file /github/home/.cache/R/ExperimentHub/21f275852cbd_7755 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/21f428f3345_7755 in memory. This may take a while... merged_scalo <- rbind( scalo |> mutate(sample = 'WT'), eco1_scalo |> mutate(sample = 'eco1') @@ -660,11 +662,7 @@ References - - -Lioy, V. S., Cournac, A., Marbouty, M., Duigou, S., Mozziconacci, J., Espéli, O., Boccard, F., & Koszul, R. (2018). Multiscale structuring of the e. Coli chromosome by nucleoid-associated and condensin proteins. Cell, 172(4), 771–783.e18. https://doi.org/10.1016/j.cell.2017.12.027 - - + - + @@ -268,17 +268,18 @@ Table of contents -9.1 HiCrep - 9.2 multiHiCcompare - 9.3 TopDom - 9.4 GOTHiC +9.1 diffHic + 9.2 HiCrep + 9.3 multiHiCcompare + 9.4 TopDom + 9.5 GOTHiC References Session info References Edit this pageReport an issue - + 9 Interoperability: using HiCExperiment with other R packages @@ -304,8 +305,9 @@ -This notebook illustrates how to use a range of popular Hi-C—related R packages with HiCExperiment objects. Conversion to the following packages is illustrated here: +This notebook illustrates how to use a range of popular Hi-C—related R packages with HiCExperiment objects. Conversion to the data structures supported by the following packages is illustrated here: +diffHic hicrep multiHiCcompare TopDom @@ -315,10 +317,146 @@ - -9.1 HiCrep + +9.1 diffHic +diffHic is the first R package dedicated to Hi-C processing and analysis (Lun & Smyth (2015)). It is packed with useful functions to generate a contact matrix from read pairs and to perform downstream investigation, including normalization, 2D “peak” (i.e. loops) finding and aggregation, differential interaction between samples, etc. It works seamlessly with the InteractionSet class of object, which can be easily obtained from a HiCExperiment object. + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 +To do so, we first need to extract GInteractions from one or several HiCExperiment objects and create a single InteractionSet object. + +library(InteractionSet) +library(GenomicRanges) +library(HiCExperiment) +library(HiContactsData) + +# ---- This downloads an example `.mcool` file and caches it locally +coolf <- HiContactsData('yeast_wt', 'mcool') +## see ?HiContactsData and browseVignettes('HiContactsData') for documentation +## loading from cache +cool <- import(coolf, format = 'cool') +gi <- cool |> + interactions() |> + as("ReverseStrictGInteractions") +iset <- InteractionSet( + assays = list( + counts = matrix(gi$count, ncol = 1), + balanced = matrix(gi$balanced, ncol = 1) + ), + interactions = gi, + colData = data.frame(lib = c("WT"), totals = sum(gi$count)) +) + +From there, we can filter interactions to only retain those with significant enrichment over background. + +library(diffHic) +set.seed(1234) + +# --- Filter to find aggregated interactions +enrichments <- enrichedPairs(iset) +filter <- filterPeaks(enrichments, min.enrich = log2(1.2), min.diag = 5) +filtered_iset <- iset[filter] +filtered_iset +## class: InteractionSet +## dim: 41872 1 +## metadata(0): +## assays(2): counts balanced +## rownames: NULL +## rowData names(4): bin_id1 bin_id2 count balanced +## colnames: NULL +## colData names(2): lib totals +## type: ReverseStrictGInteractions +## regions: 12079 + +# --- Visualize filtered interactions +library(plyinteractions) +library(HiContacts) +## Registered S3 methods overwritten by 'readr': +## method from +## as.data.frame.spec_tbl_df vroom +## as_tibble.spec_tbl_df vroom +## format.col_spec vroom +## print.col_spec vroom +## print.collector vroom +## print.date_names vroom +## print.locale vroom +## str.col_spec vroom +interactions(filtered_iset) |> + filter(seqnames2 == 'II', seqnames1 == seqnames2) |> + plotMatrix(use.scores = 'count') + + + + + + + +Next, we can cluster filtered interactions that are next to each other. + +# --- Cluster interactions to find loops +clustered_iset <- clusterPairs(filtered_iset, tol = 5000) +clustered_iset$interactions +## ReverseStrictGInteractions object with 1644 interactions and 0 metadata columns: +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> +## [1] I 15001-149000 * --- I 1-122000 * +## [2] I 133001-148000 * --- I 127001-139000 * +## [3] I 154001-160000 * --- I 128001-149000 * +## [4] I 168001-173000 * --- I 138001-148000 * +## [5] I 184001-196000 * --- I 15001-23000 * +## ... ... ... ... ... ... ... ... +## [1640] XVI 897001-898000 * --- XVI 831001-832000 * +## [1641] XVI 907001-910000 * --- XVI 840001-843000 * +## [1642] XVI 926001-934000 * --- XVI 872001-878000 * +## [1643] XVI 933001-934000 * --- XVI 858001-859000 * +## [1644] XVI 933001-942000 * --- XVI 928001-934000 * +## ------- +## regions: 2822 ranges and 0 metadata columns +## seqinfo: 16 sequences from an unspecified genome + +# --- Visualize clustered interactions +interactions(filtered_iset) |> + mutate(cluster = clustered_iset$indices[[1]]) |> + filter(seqnames2 == 'II', seqnames1 == seqnames2) |> + plotMatrix(use.scores = 'cluster') + + + + + + + +Finally, we can visualize identified individual interaction clusters identified with diffHic using HiContacts. + +# --- Plot matrix at a clustered loops +cgi <- clustered_iset$interactions[554] +seqn <- seqnames(anchors(cgi, type="second")) +start <- start(anchors(cgi, type="second")) - 50000 +end <- end(anchors(cgi, type="first")) + 50000 +interactions_peak <- GRanges(seqn, IRanges(start, end)) +p <- plotMatrix(cool[interactions_peak]) + +library(ggplot2) +an <- anchors(cgi) +p + geom_rect( + data = data.frame(xmin = start(an[[2]]), xmax = end(an[[2]]), ymin = start(an[[1]]), ymax = end(an[[1]])), + aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), + inherit.aes = FALSE, + fill = NA, + colour = 'cyan' +) + + + + + + + + +9.2 HiCrep hicrep is a popular package to compute stratum-adjusted correlations between Hi-C datasets (Yang et al. (2017)). “Stratum” refers to the distance from the main diagonal: with increase distance from the main diagonal, interactions of the DNA polymer are bound to decrease. hicrep computes a “per-stratum” correlation score and computes a weighted average correlation for entire chromosomes. - + +Yang, T., Zhang, F., Yardımcı, G. G., Song, F., Hardison, R. C., Noble, W. S., Yue, F., & Li, Q. (2017). HiCRep: Assessing the reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 + @@ -329,31 +467,26 @@ hicrep package has been available from Bioconductor for many years but has been withdrawn from their repositories at some point. You can always install hicrep directly from its GitHub repository as follows: - -remotes::install_github('TaoYang-dev/hicrep') + +remotes::install_github('TaoYang-dev/hicrep') In order to use hicrep, we first need to create two HiCExperiment objects. - -library(InteractionSet) -library(HiCExperiment) -library(HiContactsData) - -# ---- This downloads example `.mcool` and `.pairs` files and caches them locally -coolf_wt <- HiContactsData('yeast_wt', 'mcool') + +# ---- This downloads example `.mcool` files and caches them locally coolf_eco1 <- HiContactsData('yeast_eco1', 'mcool') - -hic_wt <- import(coolf_wt, format = 'cool') + +hic_wt <- import(coolf_wt, format = 'cool') hic_eco1 <- import(coolf_eco1, format = 'cool') We can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format. - -hic_wt + +hic_wt ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -401,12 +534,14 @@ ## [,1] ## [1,] 0.9334303 - -9.2 multiHiCcompare + +9.3 multiHiCcompare The multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count). Manipulate a HiCExperiment object to coerce it into such structure is straightforward. - -library(dplyr) + +Stansfield, J. C., Cresswell, K. G., & Dozmorov, M. G. (2019). multiHiCcompare: Joint normalization and comparative analysis of complex hi-c experiments. Bioinformatics, 35(17), 2916–2923. https://doi.org/10.1093/bioinformatics/btz048 + +library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13 Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results(): - -DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
Scalograms were introduced in Lioy et al. (2018) to investigate distance-dependent contact frequencies for individual genomic bins along chromosomes. To generate a scalogram, one needs to provide a HiCExperiment object with a valid associated pairsFile.
HiCExperiment
pairsFile
pairsFile(hic) <- pairsf scalo <- scalogram(hic) -## Importing pairs file /github/home/.cache/R/ExperimentHub/1a92835ced9_7753 in memory. This may take a while... +## Importing pairs file /github/home/.cache/R/ExperimentHub/1a9456d59216_7753 in memory. This may take a while... plotScalogram(scalo |> filter(chr == 'II'), ylim = c(1e3, 1e5))
This notebook illustrates how to use a range of popular Hi-C—related R packages with HiCExperiment objects. Conversion to the following packages is illustrated here:
This notebook illustrates how to use a range of popular Hi-C—related R packages with HiCExperiment objects. Conversion to the data structures supported by the following packages is illustrated here:
diffHic
hicrep
multiHiCcompare
TopDom
diffHic is the first R package dedicated to Hi-C processing and analysis (Lun & Smyth (2015)). It is packed with useful functions to generate a contact matrix from read pairs and to perform downstream investigation, including normalization, 2D “peak” (i.e. loops) finding and aggregation, differential interaction between samples, etc. It works seamlessly with the InteractionSet class of object, which can be easily obtained from a HiCExperiment object.
To do so, we first need to extract GInteractions from one or several HiCExperiment objects and create a single InteractionSet object.
library(InteractionSet) +library(GenomicRanges) +library(HiCExperiment) +library(HiContactsData) + +# ---- This downloads an example `.mcool` file and caches it locally +coolf <- HiContactsData('yeast_wt', 'mcool') +## see ?HiContactsData and browseVignettes('HiContactsData') for documentation +## loading from cache +cool <- import(coolf, format = 'cool') +gi <- cool |> + interactions() |> + as("ReverseStrictGInteractions") +iset <- InteractionSet( + assays = list( + counts = matrix(gi$count, ncol = 1), + balanced = matrix(gi$balanced, ncol = 1) + ), + interactions = gi, + colData = data.frame(lib = c("WT"), totals = sum(gi$count)) +)
From there, we can filter interactions to only retain those with significant enrichment over background.
library(diffHic) +set.seed(1234) + +# --- Filter to find aggregated interactions +enrichments <- enrichedPairs(iset) +filter <- filterPeaks(enrichments, min.enrich = log2(1.2), min.diag = 5) +filtered_iset <- iset[filter] +filtered_iset +## class: InteractionSet +## dim: 41872 1 +## metadata(0): +## assays(2): counts balanced +## rownames: NULL +## rowData names(4): bin_id1 bin_id2 count balanced +## colnames: NULL +## colData names(2): lib totals +## type: ReverseStrictGInteractions +## regions: 12079 + +# --- Visualize filtered interactions +library(plyinteractions) +library(HiContacts) +## Registered S3 methods overwritten by 'readr': +## method from +## as.data.frame.spec_tbl_df vroom +## as_tibble.spec_tbl_df vroom +## format.col_spec vroom +## print.col_spec vroom +## print.collector vroom +## print.date_names vroom +## print.locale vroom +## str.col_spec vroom +interactions(filtered_iset) |> + filter(seqnames2 == 'II', seqnames1 == seqnames2) |> + plotMatrix(use.scores = 'count')
Next, we can cluster filtered interactions that are next to each other.
# --- Cluster interactions to find loops +clustered_iset <- clusterPairs(filtered_iset, tol = 5000) +clustered_iset$interactions +## ReverseStrictGInteractions object with 1644 interactions and 0 metadata columns: +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> +## [1] I 15001-149000 * --- I 1-122000 * +## [2] I 133001-148000 * --- I 127001-139000 * +## [3] I 154001-160000 * --- I 128001-149000 * +## [4] I 168001-173000 * --- I 138001-148000 * +## [5] I 184001-196000 * --- I 15001-23000 * +## ... ... ... ... ... ... ... ... +## [1640] XVI 897001-898000 * --- XVI 831001-832000 * +## [1641] XVI 907001-910000 * --- XVI 840001-843000 * +## [1642] XVI 926001-934000 * --- XVI 872001-878000 * +## [1643] XVI 933001-934000 * --- XVI 858001-859000 * +## [1644] XVI 933001-942000 * --- XVI 928001-934000 * +## ------- +## regions: 2822 ranges and 0 metadata columns +## seqinfo: 16 sequences from an unspecified genome + +# --- Visualize clustered interactions +interactions(filtered_iset) |> + mutate(cluster = clustered_iset$indices[[1]]) |> + filter(seqnames2 == 'II', seqnames1 == seqnames2) |> + plotMatrix(use.scores = 'cluster')
Finally, we can visualize identified individual interaction clusters identified with diffHic using HiContacts.
HiContacts
# --- Plot matrix at a clustered loops +cgi <- clustered_iset$interactions[554] +seqn <- seqnames(anchors(cgi, type="second")) +start <- start(anchors(cgi, type="second")) - 50000 +end <- end(anchors(cgi, type="first")) + 50000 +interactions_peak <- GRanges(seqn, IRanges(start, end)) +p <- plotMatrix(cool[interactions_peak]) + +library(ggplot2) +an <- anchors(cgi) +p + geom_rect( + data = data.frame(xmin = start(an[[2]]), xmax = end(an[[2]]), ymin = start(an[[1]]), ymax = end(an[[1]])), + aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), + inherit.aes = FALSE, + fill = NA, + colour = 'cyan' +)
hicrep is a popular package to compute stratum-adjusted correlations between Hi-C datasets (Yang et al. (2017)). “Stratum” refers to the distance from the main diagonal: with increase distance from the main diagonal, interactions of the DNA polymer are bound to decrease. hicrep computes a “per-stratum” correlation score and computes a weighted average correlation for entire chromosomes.
hicrep package has been available from Bioconductor for many years but has been withdrawn from their repositories at some point. You can always install hicrep directly from its GitHub repository as follows:
remotes::install_github('TaoYang-dev/hicrep')
In order to use hicrep, we first need to create two HiCExperiment objects.
library(InteractionSet) -library(HiCExperiment) -library(HiContactsData) - -# ---- This downloads example `.mcool` and `.pairs` files and caches them locally -coolf_wt <- HiContactsData('yeast_wt', 'mcool') + +# ---- This downloads example `.mcool` files and caches them locally coolf_eco1 <- HiContactsData('yeast_eco1', 'mcool') - -hic_wt <- import(coolf_wt, format = 'cool') + +hic_wt <- import(coolf_wt, format = 'cool') hic_eco1 <- import(coolf_eco1, format = 'cool') We can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format. - -hic_wt + +hic_wt ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -401,12 +534,14 @@ ## [,1] ## [1,] 0.9334303 - -9.2 multiHiCcompare + +9.3 multiHiCcompare The multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count). Manipulate a HiCExperiment object to coerce it into such structure is straightforward. - -library(dplyr) + +Stansfield, J. C., Cresswell, K. G., & Dozmorov, M. G. (2019). multiHiCcompare: Joint normalization and comparative analysis of complex hi-c experiments. Bioinformatics, 35(17), 2916–2923. https://doi.org/10.1093/bioinformatics/btz048 + +library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13 Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results(): - -DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
# ---- This downloads example `.mcool` files and caches them locally coolf_eco1 <- HiContactsData('yeast_eco1', 'mcool')
hic_wt <- import(coolf_wt, format = 'cool') + +hic_wt <- import(coolf_wt, format = 'cool') hic_eco1 <- import(coolf_eco1, format = 'cool') We can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format. - -hic_wt + +hic_wt ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -401,12 +534,14 @@ ## [,1] ## [1,] 0.9334303 - -9.2 multiHiCcompare + +9.3 multiHiCcompare The multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count). Manipulate a HiCExperiment object to coerce it into such structure is straightforward. - -library(dplyr) + +Stansfield, J. C., Cresswell, K. G., & Dozmorov, M. G. (2019). multiHiCcompare: Joint normalization and comparative analysis of complex hi-c experiments. Bioinformatics, 35(17), 2916–2923. https://doi.org/10.1093/bioinformatics/btz048 + +library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13 Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results(): - -DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
hic_wt <- import(coolf_wt, format = 'cool') hic_eco1 <- import(coolf_eco1, format = 'cool')
We can now run the main get.scc function from hicrep. The documentation for this function is available from the console by typing ?hicrep::get.scc. More information is also available from the GitHub page. It informs the end user that the input for this function should be two intra-chromosomal Hi-C raw count matrices in square (optionally sparse) format.
get.scc
?hicrep::get.scc
hic_wt + +hic_wt ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -401,12 +534,14 @@ ## [,1] ## [1,] 0.9334303 - -9.2 multiHiCcompare + +9.3 multiHiCcompare The multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count). Manipulate a HiCExperiment object to coerce it into such structure is straightforward. - -library(dplyr) + +Stansfield, J. C., Cresswell, K. G., & Dozmorov, M. G. (2019). multiHiCcompare: Joint normalization and comparative analysis of complex hi-c experiments. Bioinformatics, 35(17), 2916–2923. https://doi.org/10.1093/bioinformatics/btz048 + +library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13 Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results(): - -DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
hic_wt ## `HiCExperiment` object with 8,757,906 contacts over 12,079 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "whole genome" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -401,12 +534,14 @@ ## [,1] ## [1,] 0.9334303
The multiHiCcompare package provides functions for joint normalization and difference detection in multiple Hi-C datasets (Stansfield et al. (2019)). According to its excerpt, to perform differential interaction analysis, it requires a list of raw counts for different samples/replicates, stored in data frames with four columns (chr, start1, start2, count). Manipulate a HiCExperiment object to coerce it into such structure is straightforward.
list
chr
start1
start2
count
library(dplyr) + +Stansfield, J. C., Cresswell, K. G., & Dozmorov, M. G. (2019). multiHiCcompare: Joint normalization and comparative analysis of complex hi-c experiments. Bioinformatics, 35(17), 2916–2923. https://doi.org/10.1093/bioinformatics/btz048 + +library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13 Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results(): - -DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
library(dplyr) library(tidyr) library(purrr) hics <- list( @@ -414,7 +549,7 @@ "eco1" = import(coolf_eco1, format = 'cool') ) hics_list <- map(hics, ~ .x['XI'] |> - as.data.frame() |> + as.data.frame() |> mutate(chr = 1) |> relocate(chr) |> select(chr, start1, start2, count) @@ -429,8 +564,8 @@ ## 6 1 1 5001 13
Once this list is generated, the classical multiHiCcompare workflow can be applied: first run make_hicexp(), followed by cyclic_loess(), then hic_exactTest() and finally results():
make_hicexp()
cyclic_loess()
hic_exactTest()
results()
DI <- hics_list |> + +DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000 - -9.3 TopDom -The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). -Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. - -library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
DI <- hics_list |> make_hicexp( data_list = hics_list, groups = factor(c(1, 2)) @@ -452,12 +587,16 @@ ## 22640: 1 665001 665001 0 -0.3110054 10.013750 0.60075706 1.0000000 ## 22641: 1 665001 666001 1 -0.4989794 7.750157 0.41481212 1.0000000
The TopDom method is widely used to annotate topological domains in genomes from Hi-C data ((Shin_2016?)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)).
Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object.
?TopDom::readHiC
library(TopDom) + +9.4 TopDom +The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)). + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, X. J. (2015). TopDom: An efficient and deterministic method for identifying topological domains in genomes. Nucleic Acids Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + +Bengtsson, H., Shin, H., Lazaris, H., Hu, G., & Zhou, X. (2020). R package TopDom: An efficient and deterministic method for identifying topological domains in genomes. https://github.com/HenrikBengtsson/TopDom +Unfortunately, the format of the input to TopDom is rather tricky (see ?TopDom::readHiC). The following chunk of code shows how to coerce a HiCExperiment object into a TopDom-compatible object. + +library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ... Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains. - -domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
The TopDom method is widely used to annotate topological domains in genomes from Hi-C data (Shin et al. (2015)). The TopDom package was created to implement this method in R (Bengtsson et al. (2020)).
library(TopDom) hic <- import(coolf_wt, format = 'cool') HiCExperiment2TopDom <- function(hic, chr) { data <- list() @@ -465,7 +604,7 @@ data$counts <- as.matrix(cm) |> base::as.matrix() data$counts[is.na(data$counts)] <- 0 data$bins <- regions(cm) |> - as.data.frame() |> + as.data.frame() |> select(seqnames, start, end) |> mutate(seqnames = as.character(seqnames)) |> mutate(id = 1:n(), start = start - 1) |> @@ -487,8 +626,8 @@ ## num [1:813, 1:813] 0 0 0 0 0 0 0 0 0 0 ...
Now that we have coerced a HiCExperiment object into a TopDom-compatible object, we can use the main TopDom function to annotate topological domains.
domains <- TopDom::TopDom(hic_topdom, window.size = 5) + +domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ... The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text. - -topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
domains <- TopDom::TopDom(hic_topdom, window.size = 5) domains ## TopDom: ## Parameters: @@ -520,8 +659,8 @@ ## $ name : chr "gap" "domain" "gap" "domain" ...
The resulting domains object can be used to extract annotated domains, store them in topologicalFeatures of the original HiCExperiment, and optionally write a bed file to export them in text.
domains
topologicalFeatures
bed
topologicalFeatures(hic, 'domain') <- domains$bed |> + +topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed') - -9.4 GOTHiC + +9.5 GOTHiC GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)). - + +Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 + @@ -571,20 +712,20 @@ Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R. - -Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
topologicalFeatures(hic, 'domain') <- domains$bed |> mutate(chromStart = chromStart + 1) |> filter(name == 'domain') |> makeGRangesFromDataFrame() @@ -545,10 +684,12 @@ rtracklayer::export(topologicalFeatures(hic, 'domain'), 'hic_domains.bed')
GOTHiC relies on a cumulative binomial test to detect interactions between distal genomic loci that have significantly more reads than expected by chance in Hi-C experiments (Mifsud et al. (2017)).
GOTHiC
Based on these facts, we can simplify the binomial test function provided by GOTHiC so that it can directly used binned interactions imported as a HiCExperiment object in R.
GOTHiC_binomial
GOTHiC_binomial <- function(x) { + +Show the code for GOTHiC_binomial functionGOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ } - -res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
GOTHiC_binomial <- function(x) { if (length(trans(x)) != 0) stop("Only `cis` interactions can be used here.") ints <- interactions(x) |> - as.data.frame() |> + as.data.frame() |> select(seqnames1, start1, seqnames2, start2, count) |> dplyr::rename(chr1 = seqnames1, locus1 = start1, chr2 = seqnames2, locus2 = start2, frequencies = count) |> mutate(locus1 = locus1 - 1, locus2 = locus2 - 1) |> mutate(int1 = paste0(chr1, '_', locus1), int2 = paste0(chr2, '_', locus2)) numberOfReadPairs <- sum(ints$frequencies) - all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) - all_bins <- sort(all_bins) + all_bins <- unique(c(unique(ints$int1), unique(ints$int2))) + all_bins <- sort(all_bins) upperhalfBinNumber <- (length(all_bins)^2 - length(all_bins))/2 cov <- ints |> @@ -632,12 +773,12 @@ }
res <- GOTHiC_binomial(hic["II"]) + +res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome @@ -669,7 +810,7 @@ References Session info - + ## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
res <- GOTHiC_binomial(hic["II"]) res ## `HiCExperiment` object with 471,364 contacts over 802 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -649,19 +790,19 @@ interactions(res) ## GInteractions object with 74360 interactions and 9 metadata columns: -## seqnames1 ranges1 seqnames2 ranges2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange -## <Rle> <IRanges> <Rle> <IRanges> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> -## [1] II 1-1000 --- II 1001-2000 | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 -## [2] II 1-1000 --- II 5001-6000 | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 -## [3] II 1-1000 --- II 6001-7000 | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 -## [4] II 1-1000 --- II 8001-9000 | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 -## [5] II 1-1000 --- II 9001-10000 | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 -## ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... -## [74356] II 807001-808000 --- II 809001-810000 | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 -## [74357] II 807001-808000 --- II 810001-811000 | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 -## [74358] II 808001-809000 --- II 808001-809000 | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 -## [74359] II 808001-809000 --- II 809001-810000 | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 -## [74360] II 809001-810000 --- II 809001-810000 | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 +## seqnames1 ranges1 strand1 seqnames2 ranges2 strand2 | bin_id1 bin_id2 count balanced probability predicted pvalue qvalue logFoldChange +## <Rle> <IRanges> <Rle> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> <numeric> +## [1] II 1-1000 * --- II 1001-2000 * | 231 232 1 NaN 7.83580e-09 0.00369352 3.68670e-03 0.063385760 8.08079 +## [2] II 1-1000 * --- II 5001-6000 * | 231 236 2 NaN 2.81318e-08 0.01326033 8.71446e-05 0.001926954 7.23674 +## [3] II 1-1000 * --- II 6001-7000 * | 231 237 1 NaN 2.02960e-08 0.00956681 9.52120e-03 0.150288341 6.70775 +## [4] II 1-1000 * --- II 8001-9000 * | 231 239 2 NaN 6.73108e-08 0.03172791 4.92808e-04 0.009806734 5.97810 +## [5] II 1-1000 * --- II 9001-10000 * | 231 240 3 NaN 7.37336e-08 0.03475538 6.81713e-06 0.000173165 6.43158 +## ... ... ... ... ... ... ... ... . ... ... ... ... ... ... ... ... ... +## [74356] II 807001-808000 * --- II 809001-810000 * | 1038 1040 8 0.0472023 3.85638e-07 0.1817758 2.51560e-11 1.07966e-09 5.45977 +## [74357] II 807001-808000 * --- II 810001-811000 * | 1038 1041 1 NaN 5.03006e-08 0.0237099 2.34310e-02 3.38098e-01 5.39837 +## [74358] II 808001-809000 * --- II 808001-809000 * | 1039 1039 1 NaN 8.74604e-08 0.0412257 4.03875e-02 5.49519e-01 4.60031 +## [74359] II 808001-809000 * --- II 809001-810000 * | 1039 1040 7 NaN 1.02111e-07 0.0481315 1.13834e-13 5.77259e-12 7.18423 +## [74360] II 809001-810000 * --- II 809001-810000 * | 1040 1040 2 0.0411355 1.19216e-07 0.0561941 1.52097e-03 2.79707e-02 5.15344 ## ------- ## regions: 802 ranges and 4 metadata columns ## seqinfo: 16 sequences from an unspecified genome
## ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.1 (2023-06-16) @@ -689,6 +830,7 @@ ## aggregation 1.0.1 2018-01-25 [1] CRAN (R 4.3.1) ## AnnotationDbi 1.64.0 2023-10-24 [1] Bioconductor ## AnnotationHub * 3.10.0 2023-10-24 [1] Bioconductor +## beeswarm 0.4.0 2021-06-01 [1] CRAN (R 4.3.1) ## Biobase * 2.62.0 2023-10-24 [1] Bioconductor ## BiocFileCache * 2.10.1 2023-10-26 [1] Bioconductor ## BiocGenerics * 0.48.0 2023-10-24 [1] Bioconductor @@ -701,17 +843,21 @@ ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.1) ## bitops 1.0-7 2021-04-24 [1] CRAN (R 4.3.1) ## blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.1) +## BSgenome 1.70.0 2023-10-24 [1] Bioconductor ## cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.1) +## Cairo 1.6-1 2023-08-18 [1] CRAN (R 4.3.1) ## calibrate 1.7.7 2020-06-19 [1] CRAN (R 4.3.1) ## cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.1) ## codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) ## colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.1) ## crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.1) +## csaw 1.36.0 2023-10-24 [1] Bioconductor ## curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) ## data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.1) ## DBI 1.1.3 2022-06-18 [1] CRAN (R 4.3.1) ## dbplyr * 2.4.0 2023-10-26 [1] CRAN (R 4.3.1) ## DelayedArray 0.28.0 2023-10-24 [1] Bioconductor +## diffHic * 1.34.0 2023-10-24 [1] Bioconductor ## digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ## dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.1) ## edgeR 4.0.0 2023-10-24 [1] Bioconductor @@ -719,6 +865,7 @@ ## evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1) ## ExperimentHub * 2.10.0 2023-10-24 [1] Bioconductor ## fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1) +## farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.1) ## fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.1) ## filelock 1.0.2 2018-10-05 [1] CRAN (R 4.3.1) ## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.1) @@ -726,15 +873,19 @@ ## GenomeInfoDbData 1.2.11 2023-10-30 [1] Bioconductor ## GenomicAlignments 1.38.0 2023-10-24 [1] Bioconductor ## GenomicRanges * 1.54.0 2023-10-24 [1] Bioconductor +## ggbeeswarm 0.7.2 2023-04-29 [1] CRAN (R 4.3.1) ## ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1) +## ggrastr 1.0.2 2023-06-01 [1] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.1) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.1) ## gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.1) ## gtools 3.9.4 2022-11-27 [1] CRAN (R 4.3.1) ## HiCcompare 1.24.0 2023-10-24 [1] Bioconductor ## HiCExperiment * 1.2.0 2023-10-24 [1] Bioconductor +## HiContacts * 1.4.0 2023-10-24 [1] Bioconductor ## HiContactsData * 1.4.0 2023-10-26 [1] Bioconductor ## hicrep * 1.12.2 2023-10-30 [1] Github (TaoYang-dev/hicrep@e485dfa) +## hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.1) ## htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1) ## htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.1) ## httpuv 1.6.12 2023-10-23 [1] CRAN (R 4.3.1) @@ -745,7 +896,8 @@ ## jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) ## KEGGREST 1.42.0 2023-10-24 [1] Bioconductor ## KernSmooth 2.23-22 2023-07-10 [1] CRAN (R 4.3.1) -## knitr 1.44 2023-09-11 [1] CRAN (R 4.3.1) +## knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) +## labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.1) ## later 1.3.1 2023-05-02 [1] CRAN (R 4.3.1) ## lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) ## lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.1) @@ -757,6 +909,7 @@ ## MatrixGenerics * 1.14.0 2023-10-24 [1] Bioconductor ## matrixStats * 1.0.0 2023-06-02 [1] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1) +## metapod 1.10.0 2023-10-24 [1] Bioconductor ## mgcv 1.9-0 2023-07-11 [1] CRAN (R 4.3.1) ## mime 0.12 2021-09-28 [1] CRAN (R 4.3.1) ## multiHiCcompare * 1.20.0 2023-10-24 [1] Bioconductor @@ -766,7 +919,9 @@ ## pheatmap 1.0.12 2019-01-04 [1] CRAN (R 4.3.1) ## pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.1) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.1) +## plyinteractions * 0.99.8 2023-10-30 [1] Github (tidyomics/plyinteractions@81c56dc) ## plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.1) +## plyranges 1.22.0 2023-10-24 [1] Bioconductor ## png 0.1-8 2022-11-29 [1] CRAN (R 4.3.1) ## promises 1.2.1 2023-08-10 [1] CRAN (R 4.3.1) ## purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) @@ -776,17 +931,19 @@ ## RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.1) ## Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) ## RCurl 1.98-1.12 2023-03-27 [1] CRAN (R 4.3.1) +## readr 2.1.4 2023-02-10 [1] CRAN (R 4.3.1) ## reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.1) ## restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.3.1) ## rhdf5 2.46.0 2023-10-24 [1] Bioconductor ## rhdf5filters 1.14.0 2023-10-24 [1] Bioconductor ## Rhdf5lib 1.24.0 2023-10-24 [1] Bioconductor +## Rhtslib 2.4.0 2023-10-24 [1] Bioconductor ## rjson 0.2.21 2022-01-09 [1] CRAN (R 4.3.1) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.1) ## rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) ## Rsamtools 2.18.0 2023-10-24 [1] Bioconductor +## RSpectra 0.16-1 2022-04-24 [1] CRAN (R 4.3.1) ## RSQLite 2.3.2 2023-10-28 [1] CRAN (R 4.3.1) -## rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) ## rtracklayer 1.62.0 2023-10-24 [1] Bioconductor ## S4Arrays 1.2.0 2023-10-24 [1] Bioconductor ## S4Vectors * 0.40.1 2023-10-26 [1] Bioconductor @@ -806,8 +963,9 @@ ## tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.1) ## utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) ## vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) +## vipor 0.4.5 2017-03-22 [1] CRAN (R 4.3.1) ## vroom 1.6.4 2023-10-02 [1] CRAN (R 4.3.1) -## withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1) +## withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1) ## xfun 0.40 2023-08-09 [1] CRAN (R 4.3.1) ## XML 3.99-0.14 2023-03-19 [1] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1) @@ -823,7 +981,8 @@ References - + + Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, @@ -934,6 +1093,11 @@ ChIA-PET and related experiments. F1000Research, 5, 950. https://doi.org/10.12688/f1000research.8759.2 + +Lun, A. T. L., & Smyth, G. K. (2015). diffHic: +a Bioconductor package to detect differential genomic interactions in +Hi-C data. BMC Bioinf., 16(1), 1–11. https://doi.org/10.1186/s12859-015-0683-0 + Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, @@ -978,6 +1142,12 @@ HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Shin, H., Shi, Y., Dai, C., Tjong, H., Gong, K., Alber, F., & Zhou, +X. J. (2015). TopDom: An efficient and deterministic method +for identifying topological domains in genomes. Nucleic Acids +Research, 44(7), e70–e70. https://doi.org/10.1093/nar/gkv1505 + Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. @@ -1001,8 +1171,7 @@ reproducibility of hi-c data using a stratum-adjusted correlation coefficient. Genome Research, 27(11), 1939–1949. https://doi.org/10.1101/gr.220640.117 - - - + @@ -381,7 +381,7 @@ hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -400,12 +400,16 @@ 5.1.1 Balancing a raw interaction count map Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices. To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified). - + +Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 + +Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 + normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@ - + 5.1.3 Computing autocorrelated map Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)). -The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 +The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome. autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10. - + +Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 + plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher. - + 1.2 Hi-C file formats Two important output files are typically generated during Hi-C data pre-processing: @@ -442,7 +452,7 @@ EAS139:136:FC706VJ:2:1286:25:275154 chr1 30000 chr3 40000 + - More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format. - + 1.2.2 Binned contact matrix files 1.2.2.1 Binning pairs into a matrix @@ -507,15 +517,17 @@ This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it. This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”. In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins. - + 1.2.2.2 Plain-text matrices: HiC-Pro style The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above. -Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. +Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files. .(m)cool and .hic file formats are two standards addressing these limitations. - + 1.2.2.3 .(m)cool matrices The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called: - + +Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 + bins: containing the same information than the regions.bed file; @@ -535,12 +547,12 @@ Moreover, parsing .cool files is possible using HDF standard APIs. - + 1.2.2.4 .hic matrices The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)). - + 1.3 Pre-processing Hi-C data - + 1.3.1 Processing workflow Fundamentally, the main steps performed to pre-process Hi-C are: @@ -553,7 +565,7 @@ In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)): - + ## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf The *.pairs and *.mcool files are the pairs and contact matrix files, respectively. These are the output files the end-user is generally looking for. @@ -779,35 +797,7 @@ References - - -Abdennur, N., & Mirny, L. A. (2019). Cooler: Scalable storage for hi-c data and other genomically labeled arrays. Bioinformatics, 36(1), 311–316. https://doi.org/10.1093/bioinformatics/btz540 - - -Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 - - -Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 - - -Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z - - -Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - -Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 - - -Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x - - -Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 - - + - + @@ -293,11 +293,10 @@ 7.3.2 Other R packages - References Edit this pageReport an issue - + 7 Finding topological features in Hi-C @@ -313,7 +312,8 @@ - +reference-section-title: References + @@ -331,13 +331,15 @@ - + 7.1 Chromosome compartments Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment). - + 7.1.1 Importing Hi-C data To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp. - + +Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 + library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@ Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot. - + 7.2 Topological domains Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries. @@ -495,10 +497,20 @@ They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)). - + +Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 + +Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 + +Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 + 7.2.1 Computing diamond insulation score Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare. -HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. + +Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 + +Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 +HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution. # - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@ Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds. - + 7.3 Chromatin loops - + 7.3.1 chromosight Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + 7.3.1.1 Identifying loops hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ ) - + 7.3.2 Other R packages A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications. -References - - + Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for hi-c data reveals regulatory chromatin contacts. Genome Research, 24(6), 999–1011. https://doi.org/10.1101/gr.160374.113 - - -Crane, E., Bian, Q., McCord, R. P., Lajoie, B. R., Wheeler, B. S., Ralston, E. J., Uzawa, S., Dekker, J., & Meyer, B. J. (2015). Condensin-driven remodelling of x chromosome topology during dosage compensation. Nature, 523(7559), 240–244. https://doi.org/10.1038/nature14450 - - -Krietenstein, N., Abraham, S., Venev, S. V., Abdennur, N., Gibcus, J., Hsieh, T.-H. S., Parsi, K. M., Yang, L., Maehr, R., Mirny, L. A., Dekker, J., & Rando, O. J. (2020). Ultrastructural details of mammalian chromosome architecture. Molecular Cell, 78(3), 554–565.e7. https://doi.org/10.1016/j.molcel.2020.03.003 - - -Krismer, K., Guo, Y., & Gifford, D. K. (2020). IDR2D identifies reproducible genomic interactions. Nucleic Acids Research, 48(6), e31–e31. https://doi.org/10.1093/nar/gkaa030 - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + Mifsud, B., Martincorena, I., Darbo, E., Sugar, R., Schoenfelder, S., Fraser, P., & Luscombe, N. M. (2017). GOTHiC, a probabilistic model to resolve complex biases and to identify real interactions in hi-c data. PLOS ONE, 12(4), e0174744. https://doi.org/10.1371/journal.pone.0174744 - - -Pope, B. D., Ryba, T., Dileep, V., Yue, F., Wu, W., Denas, O., Vera, D. L., Wang, Y., Hansen, R. S., Canfield, T. K., Thurman, R. E., Cheng, Y., Gülsoy, G., Dennis, J. H., Snyder, M. P., Stamatoyannopoulos, J. A., Taylor, J., Hardison, R. C., Kahveci, T., … Gilbert, D. M. (2014). Topologically associating domains are stable units of replication-timing regulation. Nature, 515(7527), 402–405. https://doi.org/10.1038/nature13986 - - -Schmitt, A. D., Hu, M., Jung, I., Xu, Z., Qiu, Y., Tan, C. L., Li, Y., Lin, S., Lin, Y., Barr, C. L., & Ren, B. (2016). A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Reports, 17(8), 2042–2059. https://doi.org/10.1016/j.celrep.2016.10.061 - - -Sefer, E. (2022). A comparison of topologically associating domain callers over mammals at high resolution. BMC Bioinformatics, 23(1). https://doi.org/10.1186/s12859-022-04674-2 - - -Stadhouders, R., Filion, G. J., & Graf, T. (2019). Transcription factors and 3D genome conformation in cell-fate decisions. Nature, 569(7756), 345–354. https://doi.org/10.1038/s41586-019-1182-7 - - - - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 4 Hi-C data visualization @@ -356,7 +356,7 @@ hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@ - + 4.3 Advanced visualization - + 4.3.1 Overlaying topological features Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap. To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from. - + +Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 + library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@ - + 11.1 Importing data The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package. - + +Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 + @@ -520,8 +522,8 @@ ints <- cis(.x) |> ## Filter out trans interactions detrend() |> ## Compute O/E scores interactions() ## Recover interactions - ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID - ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID + ints$comp_first <- join_overlap_left(anchors(ints, "first"), compts)$ID + ints$comp_second <- join_overlap_left(anchors(ints, "second"), compts)$ID tibble( sample = .y, bin1 = ints$comp_first, @@ -529,15 +531,15 @@ dist = pairdist(ints), OE = ints$detrended ) |> - filter(dist > 5e6) |> - mutate(type = case_when( + filter(dist > 5e6) |> + mutate(type = case_when( grepl('A', bin1) & grepl('A', bin2) ~ 'AA', grepl('B', bin1) & grepl('B', bin2) ~ 'BB', grepl('A', bin1) & grepl('B', bin2) ~ 'AB', grepl('B', bin1) & grepl('A', bin2) ~ 'BA' )) |> - filter(bin1 != bin2) -}) |> list_rbind() |> mutate( + filter(bin1 != bin2) +}) |> list_rbind() |> mutate( sample = factor(sample, names(hics)[c(1, 2, 5)]) ) @@ -554,11 +556,7 @@ References - - -Gibcus, J. H., Samejima, K., Goloborodko, A., Samejima, I., Naumova, N., Nuebler, J., Kanemaki, M. T., Xie, L., Paulson, J. R., Earnshaw, W. C., Mirny, L. A., & Dekker, J. (2018). A pathway for mitotic chromosome formation. Science, 359(6376). https://doi.org/10.1126/science.aao6135 - - +
Hi-C sequencing coverage is systematically affected by multiple confounding factors, e.g. density of restriction sites, GC%, genome mappability, etc.. Overall, it generally ends up not homogenous throughout the entire genome and this leads to artifacts in un-normalized count matrices.
To correct for sequencing coverage heterogeneity of raw count maps, Hi-C data can be normalized using matrix balancing approaches (Cournac et al. (2012), Imakaev et al. (2012)). This is generally done directly on the disk-stored matrices using out-of-memory strategies (e.g. with cooler balance <.cool>). However, if contact matrix files are imported into a HiCExperiment object but no balanced scores are available, in-memory balancing can be performed using the normalize function. This adds an extra ICE element in scores list (while the interactions themselves are unmodified).
cooler balance <.cool>
balanced
normalize
ICE
scores
interactions
normalized_hic <- normalize(hic) normalized_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -442,7 +446,7 @@ detrended_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -487,17 +491,19 @@
Correlation matrices are often calculated from balanced Hi-C matrices. For instance, in genomes composed of eu- and heterochromatin, a correlation matrix can be used to reveal a checkerboard pattern emphasizing the segregation of chromatin into two A/B compartments (Lieberman-Aiden et al. (2009)).
The autocorrelate function is used to compute a correlation matrix of a HiCExperiment object. For each pair of interacting loci, the autocorrelated score represents the correlation between their respective interaction profiles with the rest of the genome.
autocorrelate
autocorrelated
autocorr_hic <- autocorrelate(hic) ## autocorr_hic ## `HiCExperiment` object with 471,364 contacts over 407 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -527,7 +533,9 @@ Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10.
Here we have illustrated how to compute an autocorrelation matrix from a HiCExperiment object using the example yeast Hi-C experiment. Bear in mind that this is unusual and not very useful, as yeast chromatin is not segregated in two compartments but rather follows a Rabl conformation (Duan et al. (2010)). An example of autocorrelation map from a vertebrate Hi-C experiment (for which chromatin is segregated in A/B compartments) is shown in Chapter 10.
plotMatrix( autocorr_hic, use.scores = 'autocorrelated', @@ -569,7 +577,7 @@ hic2 ## `HiCExperiment` object with 168,785 contacts over 150 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "II:400,000-700,000" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -689,20 +697,7 @@ References - - -Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R., & Mozziconacci, J. (2012). Normalization of a chromosomal contact map. BMC Genomics, 13(1). https://doi.org/10.1186/1471-2164-13-436 - - -Duan, Z., Andronescu, M., Schutz, K., McIlwain, S., Kim, Y. J., Lee, C., Shendure, J., Fields, S., Blau, C. A., & Noble, W. S. (2010). A three-dimensional model of the yeast genome. Nature, 465(7296), 363–367. https://doi.org/10.1038/nature08973 - - -Imakaev, M., Fudenberg, G., McCord, R. P., Naumova, N., Goloborodko, A., Lajoie, B. R., Dekker, J., & Mirny, L. A. (2012). Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature Methods, 9(10), 999–1003. https://doi.org/10.1038/nmeth.2148 - - -Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 - - + - + @@ -296,7 +296,7 @@ Edit this pageReport an issue - + 1 Hi-C pre-processing steps @@ -325,19 +325,29 @@ This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file. - + 1.1 Experimental considerations - + 1.1.1 Experimental approach The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library. - - + +Lieberman-Aiden, E., Berkum, N. L. van, Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., Lander, E. S., & Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326(5950), 289–293. https://doi.org/10.1126/science.1181369 + +Dekker, J., Rippe, K., Dekker, M., & Kleckner, N. (2002). Capturing chromosome conformation. Science, 295(5558), 1306–1311. https://doi.org/10.1126/science.1067799 + + 1.1.2 C variants A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below). - + +Davies, J. O. J., Oudelaar, A. M., Higgs, D. R., & Hughes, J. R. (2017). How best to identify chromosomal interactions: A comparison of approaches. Nature Methods, 14(2), 125–134. https://doi.org/10.1038/nmeth.4146 + Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts. - + +Deshpande, A. S., Ulahannan, N., Pendleton, M., Dai, X., Ly, L., Behr, J. M., Schwenk, S., Liao, W., Augello, M. A., Tyer, C., Rughani, P., Kudman, S., Tian, H., Otis, H. G., Adney, E., Wilkes, D., Mosquera, J. M., Barbieri, C. E., Melnick, A., … Imieliński, M. (2022). Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing. Nature Biotechnology, 40(10), 1488–1499. https://doi.org/10.1038/s41587-022-01289-z + +Tavares-Cadete, F., Norouzi, D., Dekker, B., Liu, Y., & Dekker, J. (2020). Multi-contact 3C reveals that the human genome during interphase is largely not entangled. Nature Structural &Amp\(\mathsemicolon\) Molecular Biology, 27(12), 1105–1114. https://doi.org/10.1038/s41594-020-0506-5 + 1.1.3 Sequencing Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C. Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz. @@ -362,7 +372,7 @@ @@@FFFFFFHHHHIJJIJJHIIEH
This chapter introduces the reader to general Hi-C experimental and computational steps to perform the pre-processing of Hi-C. This encompasses read alignment, pairs generation and filtering and pairs binning into a contact matrix file.
The Hi-C procedure (Lieberman-Aiden et al. (2009)) stems from the clever combination of high-throughput sequencing and Chromatin Conformation Capture (3C) experimental approach (Dekker et al. (2002)). In Hi-C, chromatin is crosslinked within intact nuclei and enzymatically digested (usually with one or several restriction enzymes, but Hi-C variants using MNase or DNase exist). End-repair introduces biotinylated dNTPs and is followed by religation, which generates chimeric DNA fragments consisting of genomic loci originally lying in spatial proximity, usually crosslinked to a shared protein complex. After religation, DNA fragments are sheared, biotin-containing fragments are pulled-down and converted into a sequencing library.
A number of C variants have been proposed since the publication of the original 3C method (reviewed by Davies et al. (2017)), the main ones being Capture-C and ChIA-PET (see procedure below).
Capture-C is useful to quantify interactions between a set of regulatory elements of interest. ChIA-PET, on the other hand, can identify interactions mediated by a specific protein of interest. Finally, an increasing number of Hi-C approaches rely on long-read sequencing (e.g. Deshpande et al. (2022), Tavares-Cadete et al. (2020)) to identify clusters of 3D contacts.
Hi-C libraries are traditionally sequenced with short-read technology, and are by essence paired-end libraries. For this reason, the end result of the experimental side of the Hi-C consists of two fastq files, each one containing sequences for one extremity of the DNA fragments purified during Hi-C. These are the two files we need to move on to the computational side of Hi-C.
Fastq files are plain text files (usually compressed, with the .gz extension). They are generated by the sequencing machine during a sequencing run, and for Hi-C, necessarily come in pairs, generally called *_R1.fq.gz and *_R2.fq.gz.
.gz
*_R1.fq.gz
*_R2.fq.gz
These two reads are the first listed in their respective file. Notice how they bear the same name (first line): they form a pair. The second line corresponds to the sequence read by the sequencer, the third line is a single + separator, and the last line indicates the per-base sequencing quality following a nebulous cypher.
+
Two important output files are typically generated during Hi-C data pre-processing:
More information about the conventions related to this text file are provided by the 4DN consortium, which originally formalized the specifications of this file format.
This count.matrix file lists a total of 5 pairs, and in which bin each extremity of each pair is contained. Thus, a count matrix is a lossy file format, as it “rounds up” the position of each pair’s extremity to the genomic bin containing it.
count.matrix
This “i-j-x” 3-column format, in which i-j relate to a pair of “coordinates” indices (or a pair of genomic bin indices) in a matrix, and x relates to a score associated with the pair of indices, is generally called a “COO sparse matrix”.
i-j
x
In this context, the regions.bed acts as a secondary “dictionary” describing the nature of i and j indices, i.e. the location of genomic bins.
regions.bed
i
j
The HiC-Pro pipeline (Servant et al. (2015)) outputs 2 text files: a regions.bed file and a count.matrix file. They are generated by the exact process explained above.
Together, these two files can describe the interaction frequency between any pair of genomic loci. They are non-binarized text files, and as such are technically human-readable. However, it is relatively hard to get a grasp of these files compared to a plain .pairs file, as information regarding genomic bins and interaction frequencies are stored in separate files. Moreover, because they are non-binarized, these files often end up using a large disk space and cannot be easily indexed. This prevents easy subsetting of the data stored in these files.
.pairs
.(m)cool and .hic file formats are two standards addressing these limitations.
.(m)cool
.hic
The .cool format has been formally defined in Abdennur & Mirny (2019) and is a particular type of HDF5 (Hierarchical Data Format) file. It is an indexed archive file storing rectangular tables called:
.cool
HDF5
Hierarchical Data Format
bins
Moreover, parsing .cool files is possible using HDF standard APIs.
HDF
The .hic format is another type of binarized, indexed and highly-compressed file (Durand et al. (2016)). It can store virtually the same information than a .cool file. However, parsing .hic files is not as straightforward as .cool files, as it does not rely on a generic file standard. Still, the straw library has been implemented in several computing languages to facilitate parsing of .hic files (Durand et al. (2016)).
straw
Fundamentally, the main steps performed to pre-process Hi-C are:
In practice, a minimal workflow to pre-process Hi-C data is the following (adapted from Open2C et al. (2023)):
## Note these fields have to be replaced by appropriate variables: ## <index> ## <input.R1.fq.gz> @@ -577,7 +589,11 @@ Juicer (Durand et al. (2016)) - + +Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C.-J., Vert, J.-P., Heard, E., Dekker, J., & Barillot, E. (2015). HiC-pro: An optimized and flexible pipeline for hi-c data processing. Genome Biology, 16(1). https://doi.org/10.1186/s13059-015-0831-x + +Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., & Aiden, E. L. (2016). Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell Systems, 3(1), 95–98. https://doi.org/10.1016/j.cels.2016.07.002 + @@ -591,7 +607,9 @@ To scale up data pre-processing, we recommend to rely on an efficient read mapper such as bwa, followed by pairs parsing, sorting and deduplication with pairtools and binning with cooler. - + +Open2C, Abdennur, N., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., & Venev, S. V. (2023). Pairtools: From sequencing data to chromosome contacts. https://doi.org/10.1101/2023.02.13.528389 + 1.3.2 hicstuff: lightweight Hi-C pipeline hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command. hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows: @@ -641,7 +659,7 @@ ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' '--quiet' '-c' 'conda-forge' '-c' 'bioconda' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' 'python=3.7.12' ## + /github/home/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/github/home/.cache/R/basilisk/1.14.0/HiCool/1.2.0/env' '-c' 'conda-forge' '-c' 'bioconda' 'python=3.7.12' 'python=3.7.12' 'bowtie2=2.5.0' 'samtools=1.16.1' 'hicstuff=3.1.5' 'chromosight=1.6.3' 'cooler=0.9.1' -## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpiR9EkC/WL4DIE]... +## HiCool :: Initiating processing of fastq files [tmp folder: /tmp/RtmpeiTnVE/WL4DIE]... ## HiCool :: Mapping fastq files... ## HiCool :: Removing unwanted chromosomes... ## HiCool :: Parsing pairs into .cool file... @@ -651,12 +669,12 @@ ## HiCool :: .fastq to .mcool processing done! ## HiCool :: Check ./HiCool/folder to find the generated files ## HiCool :: Generating HiCool report. This might take a while. -## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## HiCool :: Report generated and available @ /__w/OHCA/OHCA/HiCool/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## HiCool :: All processing successfully achieved. Congrats! ## CoolFile object -## .mcool file: ./HiCool//matrices/148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## .mcool file: ./HiCool//matrices/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## resolution: 4000 -## pairs file: ./HiCool//pairs/148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## pairs file: ./HiCool//pairs/1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## metadata(3): log args stats @@ -688,16 +706,16 @@ fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf +## └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf
Juicer
bwa
pairtools
cooler
hicstuff is an integrated workflow to process Hi-C data. Some advantages compared to solutions mentioned above are its simplicity, flexibility and lightweight. For shallow sequencing or Hi-C on smaller genomes, it efficiently parses fastq reads and processes data into binned contact matrices with a single terminal command.
hicstuff
hicstuff provides both a command-line interface (CLI) and a python API to process fastq reads into a binned contact matrix. A processing pipeline can be launched using the standard command pipeline as follows:
CLI
API
pipeline
fs::dir_tree('HiCool/') ## HiCool/ -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.html +## ├── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.html ## ├── logs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.log +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.log ## ├── matrices -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.mcool +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.mcool ## ├── pairs -## │ └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE.pairs +## │ └── 1494bbd59b0_7833^mapped-R64-1-1^WL4DIE.pairs ## └── plots -## ├── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distance.pdf -## └── 148213ddba0_7833^mapped-R64-1-1^WL4DIE_event_distribution.pdf
*.pairs
*.mcool
reference-section-title: References
Chromosome compartments refer to the segregation of the chromatin into active euchromatin (A compartments) and regulated heterochromatin (B compartment).
To investigate chromosome compartments, we will fetch a contact matrix generated from a micro-C experiment (from Krietenstein et al. (2020)). A subset of the genome-wide dataset is provided in the OHCA package. It contains intra-chromosomal interactions within chr17, binned at 5000, 100000 and 250000 bp.
OHCA
chr17
5000
100000
250000
library(HiCExperiment) library(OHCA) cf <- fs::path_package('OHCA', 'extdata', 'chr17.mcool') @@ -487,7 +489,7 @@
Here, the top-left small corner represents average O/E scores between strong B compartments and the bottom-right larger corner represents average O/E scores between strong A compartments. Note that only chr17 interactions are contained in this dataset, explaining the grainy aspect of the saddle plot.
Topological domains (a.k.a. Topologically Associating Domains, TADs, isolated neighborhoods, contact domains, …) refer to local chromosomal segments (e.b. roughly ≤ 1Mb in mammal genomes) which preferentially self-interact, in a constrained manner. They are demarcated by domain boundaries.
They are generally conserved across cell types and species (Schmitt et al. (2016)), typically correlate with units of DNA replication (Pope et al. (2014)), and could play a role during development (Stadhouders et al. (2019)).
Several approaches exist to annotate topological domains (Sefer (2022)). Several packages in R implement some of these functionalities, e.g. spectralTAD or TADcompare.
spectralTAD
TADcompare
HiContacts offers a simple getDiamondInsulation function which computes the diamond insulation score (Crane et al. (2015)). This score quantifies average interaction frequency in an insulation window (of a certain window_size) sliding along contact matrices at a chosen resolution.
getDiamondInsulation
window_size
resolution
# - Compute insulation score bpparam <- SerialParam(progressbar = FALSE) @@ -617,13 +629,15 @@
Local minima in the diamond insulation score displayed below the Hi-C contact matrix are identified using the getBorders() function, which automatically estimates a minimum threshold. These local minima correspond to borders and are visually depicted on the Hi-C map by blue diamonds.
getBorders()
chromosight
Chromatin loops, dots, or contacts, refer to a strong increase of interaction frequency between a pair of two genomic loci. They correspond to focal “dots” on a Hi-C map. Relying on computer vision algorithms, chromosight uses this property to annotate chromatin loops in a Hi-C map (Matthey-Doret et al. (2020)). chromosight is a standalone python package and is made available in R through the HiCool-managed conda environment with the getLoops() function.
python
HiCool
getLoops()
hic <- HiCool::getLoops(microC, resolution = 5000) @@ -773,45 +787,19 @@ )
A number of other R packages have been developed to identify focal chromatin loops, notably fitHiC (Ay et al. (2014)), GOTHiC (Mifsud et al. (2017)) or idr2d (Krismer et al. (2020)). Each fits a slightly different purpose, and we encourage the end user to read companion publications.
fitHiC
idr2d
hic ## `HiCExperiment` object with 303,545 contacts over 289 regions ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: "V" ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 2000 @@ -518,13 +518,15 @@
Topological features (e.g. chromatin loops, domain borders, A/B compartments, e.g. …) are often displayed over a Hi-C heatmap.
To illustrate how to do this, let’s import pre-computed chromatin loops in R. These loops have been identified using chromosight (Matthey-Doret et al. (2020)) on the contact matrix which we imported interactions from.
library(rtracklayer) library(InteractionSet) loops <- system.file('extdata', 'S288C-loops.bedpe', package = 'HiCExperiment') |> @@ -596,7 +598,7 @@ aggr_loops ## `AggrHiCExperiment` object over 148 targets ## ------- -## fileName: "/github/home/.cache/R/ExperimentHub/1a92248c093f_7752" +## fileName: "/github/home/.cache/R/ExperimentHub/1a9466054db5_7752" ## focus: 148 targets ## resolutions(5): 1000 2000 4000 8000 16000 ## active resolution: 1000 @@ -655,11 +657,7 @@ References - - -Matthey-Doret, C., Baudry, L., Breuer, A., Montagne, R., Guiglielmoni, N., Scolari, V., Jean, E., Campeas, A., Chanut, P. H., Oriol, E., Méot, A., Politis, L., Vigouroux, A., Moreau, P., Koszul, R., & Cournac, A. (2020). Computer vision for pattern detection in chromosome contact maps. Nature Communications, 11(1). https://doi.org/10.1038/s41467-020-19562-7 - - + - + @@ -336,10 +336,12 @@
The 4DN consortium provides access to the datasets published in Gibcus et al. (2018). in R, they can be obtained thanks to the fourDNData gateway package.