Skip to content

Latest commit

 

History

History
163 lines (124 loc) · 5.7 KB

README.md

File metadata and controls

163 lines (124 loc) · 5.7 KB

Methylation distribution in NRC regions of Maconellicoccus hirsutus


The following work is part of my undergraduate thesis project during my time at the Indian Institute of Technology, Delhi (IIT Delhi) in India.

Methylation distribution in over and under represented NRC regions


Using Shapiro Wilk test to check for normality distribution

> shapiro.test(subset(df, NRC_regions == "Over rep.")$Frequency)

Shapiro-Wilk normality test

data:  subset(df, NRC_regions == "Over rep.")$Frequency
W = 0.86094, p-value < 2.2e-16

> shapiro.test(subset(df, NRC_regions == "Under rep.")$Frequency)

Shapiro-Wilk normality test

data:  subset(df, NRC_regions == "Under rep.")$Frequency
W = 0.88901, p-value < 2.2e-16

From both histograms, the methylation data does not appear to be normally distributed. This is further confirmed statistically by Shapiro-Wilk's test which shows p-value < 0.05. Thus, we reject the null hypothesis of normality for both distributions at the 5% significance level.

Methylation comparison between over and under represented regions of NRC

To statistically validate the differential methylation status between over and under represented NRC regions, we use the non-parametric Wilcoxon test.

wilcox.test(df$Frequency ~ df$NRC_regions)

Wilcoxon rank sum test with continuity correction

data:  df$Frequency by df$NRC_regions
W = 1632757, p-value < 2.2e-16
Alternative hypothesis: true location shift is not equal to 0

The p-value is less than 0.05, thus the null hypothesis is rejected and it is concluded that there is a significant difference the methylation status of two NRC regions.

To understand which genomic regions within NRC caused this differential expression, we analyzed methylation in intergenic, exonic and intronic regions.

Distribution of methylation frequency in Intergenic regions of NRC

Using Shapiro Wilk test to check for normality distribution

> shapiro.test(subset(df, NRC_regions == "Over rep.")$Frequency)

Shapiro-Wilk normality test

data:  subset(df, NRC_regions == "Over rep.")$Frequency
W = 0.82099, p-value < 2.2e-16

> shapiro.test(subset(df, NRC_regions == "Under rep.")$Frequency)

Shapiro-Wilk normality test

data:  subset(df, NRC_regions == "Under rep.")$Frequency
W = 0.87614, p-value < 2.2e-16

Methylation comparison in intergenic regions of NRC

Wilcoxon rank sum test with continuity correction

data:  df$Frequency by df$NRC_regions
W = 801389, p-value = 2.46e-13
Alternative hypothesis: true location shift is not equal to 0

The p-value is less than 0.05. Therefore, there is a significant difference in the methylation status of two intergenic regions.

Distribution of methylation frequency in Exonic regions of NRC

Using Shapiro Wilk test to check for normality distribution

> shapiro.test(subset(df_exon, NRC_regions == "Over rep.")$Frequency)

Shapiro-Wilk normality test
W = 0.84151, p-value < 2.2e-16

> shapiro.test(subset(df_exon, NRC_regions == "Under rep.")$Frequency)

Shapiro-Wilk normality test
W = 0.85718, p-value = 4.223e-13

Methylation comparison in exonic regions of NRC

Wilcoxon rank sum test with continuity correction
data:  dfe$Frequency by dfe$NRC_regions
W = 39913, p-value = 0.0007649
Alternative hypothesis: true location shift is not equal to 0

As p-value < 0.05, there is a significant difference in the methylation status of two exonic regions.

Distribution of methylation frequency in Intronic regions of NRC

Using Shapiro Wilk test to check for normality distribution

> shapiro.test(subset(df_intron, NRC_regions == "Over rep.")$Frequency)

Shapiro-Wilk normality test
W = 0.85528, p-value = 1.697e-12

> shapiro.test(subset(df_intron, NRC_regions == "Under rep.")$Frequency)

Shapiro-Wilk normality test
W = 0.81156, p-value = 2.905e-10

Methylation comparison in intronic regions of NRC

Wilcoxon rank sum test with continuity correction
data:  dfi$Frequency by dfi$NRC_regions
W = 10252, p-value = 0.7405
Alternative hypothesis: true location shift is not equal to 0

As p-value > 0.05, the null hypothesis is TRUE. It means there is no significant difference in the methylation frequencies of introns in over and under represented NRC regions.

Thus, we have presented the variation and distribution of 5mC methylated CpG sites in intergenic, intronic and exonic regions in over and under represented NRC DNA.