-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy path02_01-univ_asreml.Rmd
363 lines (277 loc) · 19.7 KB
/
02_01-univ_asreml.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
## Asreml-R
```{r}
#| echo: false
gryphon <- read.csv("data/gryphon.csv")
gryphon$animal <- as.factor(gryphon$animal)
gryphon$mother <- as.factor(gryphon$mother)
gryphon$byear <- as.factor(gryphon$byear)
gryphon$sex <- as.factor(gryphon$sex)
gryphon$bwt <- as.numeric(gryphon$bwt)
gryphon$tarsus <- as.numeric(gryphon$tarsus)
gryphonped <- read.csv("data/gryphonped.csv")
gryphonped$id <- as.factor(gryphonped$id)
gryphonped$father <- as.factor(gryphonped$father)
gryphonped$mother <- as.factor(gryphonped$mother)
```
### Running the model
First, we need to load the `asreml` library:
```{r}
library(asreml)
```
To be able to fit an animal model, Asreml-r needs (the inverse of) the relationship matrix from the pedigree using the ainverse function:
```{r}
ainv <- ainverse(gryphonped)
```
We are now ready to specify our first model:
```{r}
model1 <- asreml(
fixed = bwt ~ 1, random = ~ vm(animal, ainv),
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit")
)
```
In this model, `bwt` is the response variable and the only fixed effect is the intercept, denoted as `1`.
The only random effect we have fitted is `animal`, which will provide an estimate of additive genetic variance $V_A$.
Our random `animal` effect is connected to the inverse related matrix `ainv` which integrate the relativeness or pedigree information.
`data=` specifies the name of the dataframe that contains our variables.
Finally, we inform `asreml()` what to when it encounters `NA`s in either the dependent or predictor variables (in this case we choose to remove the records).
If you use the argument "include" instead of "omit", model will keep the NA. With x="include", the model will exchange `NA` with 0.
Be careful you need to standardize your response variable so the mean will be equal to 0, if not estimates (including covariance in multivariate models) could be strongly biased due to the missing values considered as 0.
With y="include", the model will exchange `NA` with a factor labeled `mv` which will be included in the sparse equation. For more details see Asreml-R manual.
::: {.callout-note}
# Specification of the residuals structure
This simple univariate model will run fine without `residual=~idv(units)`. However, if you are going to use `vpredict()` to calculate the heritability (see below), without specifying the residuals in this way will result in a standard error for the heritability that is incorrect.
:::
Any model has assumption which need to be checked. The model can be plot which help visualizing the distribution of the model residual and check the different assumptions.
```{r}
plot(model1)
```
To see the estimates for the variance components, we run:
```{r}
summary(model1)$varcomp
```
We fitted a single random effect, so we partitioned the phenotypic variance into two components. The `vm(animal, ainv)` variance component is $V_A$ and is estimated as `r round(summary(model1)$varcomp[1,1], 2)`. Given that the ratio (`z.ratio`) of $V_A$ to its standard error is considerably larger than 2 (_i.e._ the parameter estimate is more than 2 SEs from zero), this looks likely to be significant. The `units!units` component refers to the residual variance $V_R$, and `units$R` should be ignored. If you don't include `residual=~idv(units)`in your model specification, `units$R` will provide you with the residual variance.
### Estimating heritability
We can calculate the $h^2$ of birth weight from the components above since $h^2 = V_A/V_P = V_A/(V_A+V_R)$. Thus according to this model, $h^2$ = `r round(summary(model1)$varcomp[1,1],2)` / (`r round(summary(model1)$varcomp[1,1], 2)` + `r round(summary(model1)$varcomp[2,1], 2)`) = `r round(summary(model1)$varcomp[1,1] / (summary(model1)$varcomp[1,1] + summary(model1)$varcomp[2,1]),2)`.
Alternatively, we can use the `vpredict()` function to calculate $h^2$ and its standard error. `vpredict()`function has two structures, first the model used (here `model1`) and then the estimate name with its associated equation. The equation used different `V` and their associated numbers depend on the order of the different random and residual effects included in the model.
```{r}
vpredict(model1, h2.bwt ~ V1 / (V1 + V2))
```
### Adding fixed effects
To add fixed effects to a univariate model, we simply modify the model statement. For example, we might know (or suspect) that birth weight is a sexually dimorphic trait and therefore fit in the model.
```{r}
model2 <- asreml(
fixed = bwt ~ 1 + sex,
random = ~ vm(animal, ainv),
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit")
)
```
Now we can look at the fixed effects parameters and assess their significance with a conditional Wald F-test:
::: {.content-visible when-format="html"}
```{r}
summary(model2, coef = TRUE)$coef.fixed
wald.asreml(model2, ssType = "conditional", denDF = "numeric")
```
:::
::: {.content-visible when-format="pdf"}
```{r}
#| echo: false
#| eval: true
# removing error when knitting in pdf due to weird unknown character
summary(model2, coef = TRUE)$coef.fixed
a <- wald.asreml(model2, ssType = "conditional", denDF = "numeric")
attr(a$Wald, "heading") <- NULL
a
```
:::
The very small probability (`Pr`) in the Wald test above shows that `sex` is a highly significant fixed effect, and from the parameter estimates (`summary(model2,coef=T)$coef.fixed`) we can see that the average male (sex 2) is 2.2 kg ($\pm$ 0.16 SE) heavier than the average female (sex 1).
When we look at the variance components in the model including `sex` as a fixed effect, we see that they have changed slightly from the previous model:
```{r}
summary(model2)$varcomp
```
In fact, since `sex` effects were previously contributing to the residual variance of the model, our estimate of $V_R$ (denoted `units!R` in the output) is now slightly lower than before. This has an important consequence for estimating heritability since if we calculate $V_P$ as $V_A$+$V_R$ then as we include fixed effects, we will soak up more residual variance driving $V_P$. Assuming that $V_A$ is more or less unaffected by the fixed effects fitted then as $V_P$ goes down we expect our estimate of $h^2$ will go up:
```{r}
(h2.1 <- vpredict(model1, h2.bwt ~ V1 / (V1 + V2)))
(h2.2 <- vpredict(model2, h2.bwt ~ V1 / (V1 + V2)))
```
Here $h^2$ has increased slightly from `r round(h2.1[1],2)` to `r round(h2.2[1],2)`. Which is the better estimate? It depends on what your question is. The first is an estimate of the proportion of variance in birth weight explained by additive effects, the latter is an estimate of the proportion of variance in birth weight -after conditioning on sex- that is explained by additive effects.
An important piece of advice, each researcher should be consistent in how they name their estimates and always correctly describe which estimates they are using conditional or not (to avoid any confusion).
### Adding random effects
This is done by simply modifying the model statement in the same way. For instance, fitting:
```{r}
model3 <- asreml(
fixed = bwt ~ 1 + sex,
random = ~ vm(animal, ainv) + byear,
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit")
)
summary(model3)$varcomp
(h2.3 <- vpredict(model3, h2.bwt ~ V2 / (V1 + V2 + V3)))
```
Here the variance in `bwt` explained by `byear` is `r round(summary(model3)$varcomp[1,1],2)` and, based on the `z.ratio`, appears to be significant (>2). Thus, we would conclude that year-to-year variation (_e.g._, in weather, resource abundance) contributes to $V_P$. Note that although $V_A$ has changed somewhat, as most of what is now partitioned as a birth year effect was previously included within $V_R$. Thus, what we have really done here is to partition environmental effects into those arising from year-to-year differences versus the rest, and we do not really expect much change in $h^2$ (since now $h^2 = V_A/ (V_A+V_{BY}+V_R)$).
However, we get a somewhat different result if we also add a random effect of `mother` to test for maternal effects:
```{r}
model4 <- asreml(
fixed = bwt ~ 1 + sex,
random = ~ vm(animal, ainv) + byear + mother,
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit")
)
summary(model4)$varcomp
(h2.4 <- vpredict(model4, h2.bwt ~ V1 / (V1 + V2 + V3 + V4)))
```
Here included of significant maternal variance has resulted in a further decrease in $V_R$ but also a decrease in $V_A$. The latter is because maternal effects of the sort we simulated (fixed differences between mothers) will have the consequence of increasing similarity among maternal siblings. Consequently, this similarity looks very much like the similarity induced by additive genetic effects and if present, but unmodelled, represent a type of "common environment effect" that can - and will - cause upward bias in $V_A$ and so $h^2$.
The "common environment" can be conceived as the inextricable sum of the maternal additive genetic effect (such as maternal loci) and the maternal environment or permanent environment (such as litter or nest environment created or modified by the mother).
### Testing significance of random effects
An important point to note in this tutorial is that while the `z.ratio` (`estimation`/`std.error`) reported is a good indicator of likely statistical significance (>1.96?), the standard errors are approximate and are not recommended for formal hypothesis testing. A better approach is to use likelihood-ratio tests (LRT).
For example, to test the significance of maternal effects we could compare models with and without the inclusion of maternal identity as a random effect and compare the final log-likelihoods of these models.
```{r}
model4$loglik
model3$loglik
```
shows that the model including maternal identity has a log-likelihood of `r round(model4$loglik, 3)`, and model excluding maternal identity has a log-likelihood of `r round(model3$loglik, 3)`.
```{r}
1 - pchisq(2 * (model4$loglik - model3$loglik), 1)
```
A test statistic equal to twice the absolute difference in these log-likelihoods is assumed to be distributed as Chi square with `one` degree of freedom (one term of difference between the two models). In this case we would conclude that the maternal effects are highly significant since: 2 $\times$ (`r model4$loglik` - `r model3$loglik`) equals `r 2*( model4$loglik - model3$loglik)`, and the p-value that comes with this is:
```{r}
1 - pchisq(2 * (model4$loglik - model3$loglik), 1)
```
As P < 0.0001 we would therefore conclude that adding of maternal identity as a random effect significantly improves the fit of the model, given an increase in log-likelihood of approximately `r round(model4$loglik - model3$loglik, 0)`.
### Further partitioning the variance
A population can be further fragmented into different groups or categories (such as females and males, juveniles and adults or treated and untreated). Some scientific questions require further and deeper analysis of the variance thus investigating the genetic variance of each group.
To avoid the multiplication of model (one for each group), we can directly partition the variance between groups in a unique model. In addition, by doing so, we can also test if the variance is different between groups.
As example, we decide to take the last model (model4) and partition its additive genetic variance and residual variance by sex. It is possible to further partition the other random effects, but it will complexity the animal model and requires sufficient sample size.
First, it required to order the dataset by group (here sex).
```{r}
gryphon <- gryphon[order(gryphon$sex), ]
```
To partition variances between sex, two distinct functions are required `at()` for the random level, and `dsum()` for the residual level:
```{r}
model_SEX <- asreml(
fixed = bwt ~ 1 + sex,
random = ~ at(sex):vm(animal, ainv) + at(sex):byear + at(sex):mother,
residual = ~ dsum(~ units | sex),
data = gryphon,
na.action = na.method(x = "omit", y = "omit")
)
```
By partitioning the additive genetic variance and the residual variance, the model estimates the $V_A$ and $V_R$ for each group (sex). Doing so, we can calculate the $h^2$ for each group of sex (assuming that the other random effect is similar between sexes). Here, it's important to know in which order the variances are estimated to extract the correct variance in the heritability equation.
<!--###FIX###-->
!-! here modified !!!!
```{r}
(h2.F <- vpredict(model_SEX, h2.bwt ~ V3 / (V1 + V2 + V3 + V5)))
(h2.M <- vpredict(model_SEX, h2.bwt ~ V4 / (V1 + V2 + V4 + V6)))
```
Here, we can see the point estimates of $h^2$ seems to differ between sexes (`r round(h2.F[1],2)` and `r round(h2.M[1],2)`), but their SE overlaps.
To test if the variances are different between sexes, we can compare the model partitioned `model_SEX` and the previous model without the partitioning `model4` in a likelihood ratio test (LRT) with 2 degrees of freedom since models have two components of variance of difference.
```{r}
model_SEX$loglik
model4$loglik
1 - pchisq(2 * (model_SEX$loglik - model4$loglik), 2)
```
LRT gave more information and showed that partitioning the variance and the residual between sexes did not improve the fit of the model and so their variances are not significantly different.
```{r}
#| fig-cap: Female and male heritability of birth weight
h2.sex <- rbind(h2.F, h2.M)
plot(
c(0.95, 1.05) ~ h2.sex[, 1],
xlim = c(0, 0.8), ylim = c(0.5, 1.5),
xlab = "", ylab = "", col = c("red", "blue"),
pch = c(16, 17), cex = 2, yaxt = "n"
)
arrows(
y0 = 0.95, x0 = h2.sex[1, 1] - h2.sex[1, 2],
y1 = 0.95, x1 = h2.sex[1, 1] + h2.sex[1, 2],
code = 3, angle = 90, length = 0, col = c("red"), lwd = 2
)
arrows(
y0 = 1.05, x0 = h2.sex[2, 1] - h2.sex[2, 2],
y1 = 1.05, x1 = h2.sex[2, 1] + h2.sex[2, 2],
code = 3, angle = 90, length = 0, col = c("blue"), lwd = 2
)
mtext(
"Narrow-sense heritability (±se)",
side = 1, las = 1, adj = 0.4, line = 3, cex = 1.6
)
axis(2, at = 1, labels = c("birth weight"), las = 3, cex.axis = 1.6)
```
### Modification of the variance matrix parameters
Variance represents the deviation of the distribution and it expected to be a positive value. Due to a lack of power, a structural problem in the dataset or a very low variance, Asreml-r often fixes the variance to a boundary `B` instead of a positive value `P`. When it is happened, it is generally a good idea to examine it.
To examine the boundary effect, we can explore an alternative model where the model allowed an unstructured parameter for the variance of interest or the entire variance matrix. For this example: we allowed the model to estimate any values (so allowing possible negative values of estimates) for the random and residual matrix.
First, we create a temporary model `model.temp` with the exact structure to modify.
```{r}
model.temp <- asreml(
fixed = bwt ~ 1,
random = ~ vm(animal, ainv) + byear + mother,
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit"),
start.values = T
)
G.temp <- model.temp$vparameters[(1:3), ]
G.temp$Constraint <- "U"
R.temp <- model.temp$vparameters[-(1:3), ]
R.temp$Constraint[2] <- "U"
```
The argument `start.values=T` allowed the `model.temp` to change its random parameters. We can create the two different matrices and specify which parameters will be modified. For this example, we modified the G and the R matrix to fit all variance to be `U` unstructured. it is important to note for the R matrix the line `units!R` has to be fix to 1, so it will never change.
The object G.temp and R.temp can be implemented in the following model as new parameters using the argument `R.param` and `G.param`.
```{r}
model5 <- asreml(
fixed = bwt ~ 1 + sex,
random = ~ vm(animal, ainv) + byear + mother,
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit"),
R.param = R.temp, G.param = G.temp
)
summary(model5)$varcomp
```
Since `model4` did not showed boundary, `the model5` is very similar.
It happens that Asreml will estimate negative variance if you allow the variance matrix to be unstructured . A negative variance is counter-intuitive meaning statistically the mean within the random effect is less similar than expected by chance. However, a biological reason can be hypothesized such as a sibling competition within the nest creating a negative among-individual covariance within the nest. Thus, to test this hypothesis, it is required to estimate the covariance between two random effects.
### Covariance between two random effects
Some research questions require to estimate the covariance between two random effects within a univariate model. To do so, we can use the argument `str`(structure).
As an example, we fit a model which estimate the covariance between the random effects `animal` and `mother`, thus between the additive genetic variance and the mother variance. Both variances require to operate on the same level, thus `animal` and `mother` require to be associated to the pedigree information.
The argument `str` has two components: first the equation term with the two random effects `~vm(animal, ainv)+vm(mother, ainv)` and second the structural term `~us(2):id(XXX)`. Here within the structural term, we fit a 2x2 unstructured matrix `us(2)` which estimated the variance and the covariance between two random effects in the equation term.
To successfully work, the structural term also requires the number of levels (XXX) identified within `id()`. Here a small tip, if you don't know the number of levels identified within id(), run the model with a random number. The model will not converge and an error message will appear like this one: `Size of direct product (4) does not conform with total size of included terms (2618)`. The error message can help you determine the required level within the `str` function, as here 2618 divide by 2.
```{r}
#| eval: false
model.temp2 <- asreml(
fixed = bwt ~ 1,
random = ~ str(~ vm(animal, ainv) +
vm(mother, ainv), ~ us(2):id(1309)) + byear,
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit"),
start.values = T
)
G.temp2 <- model.temp2$vparameters[(1:4), ]
G.temp2$Constraint <- "U"
model6 <- asreml(
fixed = bwt ~ 1 + sex,
random = ~ str(~ vm(animal, ainv) +
vm(mother, ainv), ~ us(2):id(1309)) + byear,
residual = ~ idv(units),
data = gryphon,
na.action = na.method(x = "omit", y = "omit"),
# equate.levels = c("animal", "mother"),
, G.param = G.temp2
)
summary(model6)$varcomp
```
We have successfully produced a code to estimate the covariance between two random effects. However, for this example, the dataset is not sufficient to properly estimate it and the model did not converge but you have the idea of how to use the function `str`.
### Reaction norm and plasticity
Since we know how to partition the variance across different levels or group within population (sex as example). It is also possible to partition across environment (character state approach) which allow to estimate the variance and covariance across levels of environment. There is an alternative way to visualize and estimate the change in variance between environment using a reaction norm known as plasticity. Instead to estimate variance at each level, we can model the slope of variance change across environments as a continuous variable.
Note: Reaction norms are easier to estimate than character state approach, due to the fact that the reaction norm involved a univariate model with a slope whereas the character states approach involves a hybrid of multivariate model.
For this example, we decide to use sex as an environment, thus estimating the reaction norm across sex, but usually researchers focus on environment.
```{r}
```
Xxxx about the model structure
Interestingly, both models ( reaction norm and character states) are intertwined in their equation, and a possible back transformation can be done.
```{r}
```