Exercise2_LiverDisease.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>

<title>Exercise 2 - Study on Liver Disease</title>

<script type="text/javascript">
window.onload = function() {
  var imgs = document.getElementsByTagName('img'), i, img;
  for (i = 0; i < imgs.length; i++) {
    img = imgs[i];
    // center an image if it is the only element of its parent
    if (img.parentElement.childElementCount === 1)
      img.parentElement.style.textAlign = 'center';
  }
};
</script>





<style type="text/css">
body, td {
   font-family: sans-serif;
   background-color: white;
   font-size: 13px;
}

body {
  max-width: 800px;
  margin: auto;
  padding: 1em;
  line-height: 20px;
}

tt, code, pre {
   font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace;
}

h1 {
   font-size:2.2em;
}

h2 {
   font-size:1.8em;
}

h3 {
   font-size:1.4em;
}

h4 {
   font-size:1.0em;
}

h5 {
   font-size:0.9em;
}

h6 {
   font-size:0.8em;
}

a:visited {
   color: rgb(50%, 0%, 50%);
}

pre, img {
  max-width: 100%;
}
pre {
  overflow-x: auto;
}
pre code {
   display: block; padding: 0.5em;
}

code {
  font-size: 92%;
  border: 1px solid #ccc;
}

code[class] {
  background-color: #F8F8F8;
}

table, td, th {
  border: none;
}

blockquote {
   color:#666666;
   margin:0;
   padding-left: 1em;
   border-left: 0.5em #EEE solid;
}

hr {
   height: 0px;
   border-bottom: none;
   border-top-width: thin;
   border-top-style: dotted;
   border-top-color: #999999;
}

@media print {
   * {
      background: transparent !important;
      color: black !important;
      filter:none !important;
      -ms-filter: none !important;
   }

   body {
      font-size:12pt;
      max-width:100%;
   }

   a, a:visited {
      text-decoration: underline;
   }

   hr {
      visibility: hidden;
      page-break-before: always;
   }

   pre, blockquote {
      padding-right: 1em;
      page-break-inside: avoid;
   }

   tr, img {
      page-break-inside: avoid;
   }

   img {
      max-width: 100% !important;
   }

   @page :left {
      margin: 15mm 20mm 15mm 10mm;
   }

   @page :right {
      margin: 15mm 10mm 15mm 20mm;
   }

   p, h2, h3 {
      orphans: 3; widows: 3;
   }

   h2, h3 {
      page-break-after: avoid;
   }
}
</style>



</head>

<body>
<h2>Exercise 2 - Study on Liver Disease</h2>

<h2>by Paula Carrio Cordo - 1 October 2016</h2>

<h2>1. Introduction</h2>

<p>We are focus on a study about liver disease. Based on Whole Genome Microarray data of gene expression, we are going to check if there are outliers and/or systematic biases in the 5 samples examined which were taken from sick patients.</p>

<h2>2. Loading data</h2>

<p>Phenotype information is contained in a file, that we read into a data frame for the study. This file contains sample name, tissue type, patient ID and associated file.</p>

<p>Before starting our analysis: labeling, coloring the subsequent plots, and generate boolean to indicate with which we can access the normal, sick and acute samples only:</p>

<pre><code class="r">samples = rownames(anno)
colors = rainbow(nrow(anno))
isNorm = anno$TissueType == &quot;norm&quot;
isSick = anno$TissueType == &quot;sick&quot;
isAcute = anno$TissueType == &quot;acute&quot;
</code></pre>

<p>Now we load the expression data:  </p>

<pre><code class="r">x = read.table(&quot;/Users/TOSHIBA/STA426 R/expressionData.txt&quot;, 
               as.is=TRUE, sep=&quot;\t&quot;, quote=&quot;&quot;, row.names=1, header= TRUE)
x = as.matrix(x)
</code></pre>

<p>With a plot we compare the expression signals from sample 1 and 2. The solid blue line gives the first diagonal and the dashed lines give the boundaries for 2-fold up- or down-regulation.</p>

<pre><code class="r">plot(x[ , &quot;norm.02&quot;], x[, &quot;norm.05&quot;], log=&quot;xy&quot;, pch=20)
abline(0, 1, col=&quot;blue&quot;)
abline(log10(2), 1, col=&quot;blue&quot;, lty=2)
abline(-log10(2), 1, col=&quot;blue&quot;, lty=2)
</code></pre>

<p><img src="figure/plotExpression-1.png" alt="plot of chunk plotExpression"></p>

<h2>3. Distribution of the intensities</h2>

<p>Assuming that the intensiy distribution of the different arrays are similar, we summarize the distribution with a boxplot and a graphic created with function plotDensities.</p>

<pre><code class="r">boxplot(x, log=&quot;y&quot;, cex.lab=0.5, las=2)
</code></pre>

<p><img src="figure/boxplot-1.png" alt="plot of chunk boxplot"></p>

<pre><code class="r">plotDensities(log(x), legend=&quot;topright&quot;,cex.lab=0.3)
</code></pre>

<p><img src="figure/boxplot%20x-1.png" alt="plot of chunk boxplot x">
(*Legend function does not work)</p>

<h2>4. Consistency of the replicates</h2>

<p>We need to compute sample correlation on the logarithmic scale.</p>

<pre><code class="r">corrMatrix &lt;- cor(x)
signif(corrMatrix, digits=3)
</code></pre>

<pre><code>##            norm.02 norm.05 norm.07 norm.09 norm.10 norm.11 sick.04 sick.12
## norm.02      1.000   0.980   0.962   0.965   0.956   0.982   0.907   0.878
## norm.05      0.980   1.000   0.963   0.977   0.974   0.980   0.937   0.923
## norm.07      0.962   0.963   1.000   0.984   0.968   0.967   0.940   0.922
## norm.09      0.965   0.977   0.984   1.000   0.985   0.969   0.955   0.943
## norm.10      0.956   0.974   0.968   0.985   1.000   0.973   0.956   0.949
## norm.11      0.982   0.980   0.967   0.969   0.973   1.000   0.924   0.899
## sick.04      0.907   0.937   0.940   0.955   0.956   0.924   1.000   0.983
## sick.12      0.878   0.923   0.922   0.943   0.949   0.899   0.983   1.000
## sick.13      0.879   0.915   0.925   0.938   0.940   0.898   0.977   0.975
## sick.14      0.953   0.979   0.972   0.989   0.987   0.966   0.964   0.959
## sick.15      0.857   0.883   0.928   0.931   0.940   0.893   0.940   0.944
## acute.04     0.890   0.915   0.934   0.951   0.949   0.906   0.977   0.970
## acute.04.a   0.901   0.933   0.944   0.958   0.963   0.924   0.991   0.984
## acute.12     0.870   0.908   0.926   0.945   0.950   0.895   0.966   0.983
## acute.13     0.858   0.891   0.923   0.934   0.935   0.882   0.962   0.970
## acute.14     0.876   0.904   0.929   0.947   0.948   0.898   0.962   0.969
## acute.15     0.833   0.871   0.892   0.911   0.919   0.857   0.940   0.962
##            sick.13 sick.14 sick.15 acute.04 acute.04.a acute.12 acute.13
## norm.02      0.879   0.953   0.857    0.890      0.901    0.870    0.858
## norm.05      0.915   0.979   0.883    0.915      0.933    0.908    0.891
## norm.07      0.925   0.972   0.928    0.934      0.944    0.926    0.923
## norm.09      0.938   0.989   0.931    0.951      0.958    0.945    0.934
## norm.10      0.940   0.987   0.940    0.949      0.963    0.950    0.935
## norm.11      0.898   0.966   0.893    0.906      0.924    0.895    0.882
## sick.04      0.977   0.964   0.940    0.977      0.991    0.966    0.962
## sick.12      0.975   0.959   0.944    0.970      0.984    0.983    0.970
## sick.13      1.000   0.948   0.939    0.966      0.977    0.970    0.973
## sick.14      0.948   1.000   0.936    0.957      0.970    0.958    0.945
## sick.15      0.939   0.936   1.000    0.967      0.962    0.971    0.969
## acute.04     0.966   0.957   0.967    1.000      0.984    0.980    0.979
## acute.04.a   0.977   0.970   0.962    0.984      1.000    0.981    0.975
## acute.12     0.970   0.958   0.971    0.980      0.981    1.000    0.987
## acute.13     0.973   0.945   0.969    0.979      0.975    0.987    1.000
## acute.14     0.953   0.958   0.975    0.988      0.977    0.987    0.982
## acute.15     0.949   0.930   0.960    0.970      0.957    0.985    0.981
##            acute.14 acute.15
## norm.02       0.876    0.833
## norm.05       0.904    0.871
## norm.07       0.929    0.892
## norm.09       0.947    0.911
## norm.10       0.948    0.919
## norm.11       0.898    0.857
## sick.04       0.962    0.940
## sick.12       0.969    0.962
## sick.13       0.953    0.949
## sick.14       0.958    0.930
## sick.15       0.975    0.960
## acute.04      0.988    0.970
## acute.04.a    0.977    0.957
## acute.12      0.987    0.985
## acute.13      0.982    0.981
## acute.14      1.000    0.983
## acute.15      0.983    1.000
</code></pre>

<p>Matrix visualization as an image:  </p>

<pre><code class="r">par(mar=c(8,8,2,2))
grayScale &lt;- gray((1:256)/256)
image(corrMatrix, col=grayScale,  axes=FALSE)
axis(1, at=seq(from=0, to=1, length.out=length(samples)), labels=samples, las=2)
axis(2, at=seq(from=0, to=1, length.out=length(samples)), labels=samples, las=2)
</code></pre>

<p><img src="figure/matrixvisual-1.png" alt="plot of chunk matrixvisual"></p>

<h2>5. Sample Clustering</h2>

<p>Clustering to appreciate the similarities of the expression patterns of the samples in a tree. </p>

<pre><code class="r">x.sd &lt;- apply(x, 1, sd, na.rm=TRUE)
ord &lt;- order(x.sd, decreasing=TRUE)
highVarGenes &lt;- ord[1:500]
par(mfrow=c(1,2))
d &lt;- as.dist(1-cor(x))
c &lt;- hclust(d, method=&quot;ward.D2&quot;)
plot(c, hang=-0.1, main=&quot;all genes&quot;, xlab=&quot;&quot;)
d &lt;-  as.dist(1-cor(x[highVarGenes, ]))
c &lt;- hclust(d, method=&quot;ward.D2&quot;)
plot(c, hang=-0.1, main=&quot;high variance genes&quot;, xlab=&quot;&quot;)
</code></pre>

<p><img src="figure/clustering-1.png" alt="plot of chunk clustering"></p>

<p>If we run the clustering without sample <strong>sick-04</strong>, the <strong>acute-04</strong> does no longer cluster in the branch with the other sick samples</p>

<pre><code class="r">par(mfrow=c(1,1))
sub &lt;-  x[ , samples != &quot;sick-04&quot;]
d = as.dist(1-cor(sub))
c=hclust(d, method=&quot;ward.D2&quot;)
 plot(c, hang=-0.1, main=&quot;all genes&quot;, xlab=&quot;&quot;)
</code></pre>

<p><img src="figure/clusterWithout04-1.png" alt="plot of chunk clusterWithout04"></p>

<h2>6. Quantile normalization</h2>

<p>We can do a quantile normalization with limma function normalizeQuantiles.  </p>

<pre><code class="r"> x.norm &lt;- normalizeQuantiles(x)
 plotDensities(log(x.norm), legend=&quot;topright&quot;, col=colors)
</code></pre>

<p><img src="figure/quantileNormalization-1.png" alt="plot of chunk quantileNormalization"></p>

<h2>7. Sample Representation in Principal Component Space</h2>

<p>Functions cmdscale and prcomp are useful to create a plot that represents the sample disatnaces in a reduced space.  </p>

<p>a) Multdimensional scaling based on our data matrix</p>

<pre><code class="r">ms &lt;- dist(t(x.norm))
cmds &lt;- cmdscale(ms)
plot(cmds, main=&quot;MDS Plot&quot;, col=colors,xlab=&quot;PC1&quot;,ylab=&quot;PC2&quot;)
</code></pre>

<p><img src="figure/cmdscale-1.png" alt="plot of chunk cmdscale"></p>

<p>b) Principal component analysis based on our data matrix </p>

<pre><code class="r">prc &lt;- prcomp(t(x.norm))
plot(prc$x[,1], prc$x[,2], main=&quot;PCA Plot&quot;, col=colors, xlab=&quot;PC1&quot;, ylab=&quot;PC2&quot;)
</code></pre>

<p><img src="figure/prcomp-1.png" alt="plot of chunk prcomp"></p>

</body>

</html>