-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathANOVA.Rmd
131 lines (105 loc) · 4.86 KB
/
ANOVA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
title: "ANOVA Tutorial"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## ANOVA in R
This documents includes code for creating a data object in R, displaying summary statistics, creating an informative plot, checking assumptions for ANOVA, and running an Analysis of Variance (ANOVA).
### Background
Nicolas Cage is considered the greatest actor of all time by many people in the United States. However, Hollywood scientists think opinions of this amazing actor are related to the culture of individual cities. Thus, to better understand how successful a new movie with Nicolas Cage will be, scientists must understand these spatial patterns. The data below represent a random selection of preference scores by people in three different South Carolina Cities. Preference scores are expressed as a percentage and range from 0 to 100 with 0 indicating low preference (i.e., does not like Nicolas Cage) and 100 indicates high preference (i.e., loves Nicolas Cage).
Hypothesis: If culture influences the opinion of Nicolas Cage, then opinion ratings will be different across cities.
Null Hypothesis: The opinion rating of Nicolas Cage is not different among cities.
### Data
The data frame created below contain 15 observations (rows) and 2 variables (columns). Each row represents one individual observation. The first column is opinion rating reported by the individual and the second column is the city the individual is from.
Important points to note. The function used here is "data.frame" which creates a data frame object that is being assigned the name "dat". Everything after "data.frame" are the arguments being passed to the function. The arguments include the column names (rating and city) and the data in each column. Quotes ("") are used to encapsulate text. This is important so R knows that this column is a categorical variable. The categorical variable will be used as the predictor of rating.
Enter the following code in R:
```{r createdat}
dat = data.frame(row=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
rating = c(13,16,8,15,9,
29,35,24,27,32,
57,59,52,55,60),
city = as.factor( c("Charleston","Charleston","Charleston","Charleston","Charleston",
"Columbia","Columbia","Columbia","Columbia","Columbia",
"Florence","Florence","Florence","Florence","Florence")))
```
The next line of code prints the data contained in the dat object
```{r dat}
dat
```
### Check Normality Assumption
```{r Norm_Assumptions, fig.height=4, fig.width=5}
#Load required packages. These might need to be installed
library(reshape2)
#Convert from long to wide format
wide_dat = dcast(dat, row~city,value.var="rating")
#Check normality
qqnorm(wide_dat$Charleston)
qqline(wide_dat$Charleston)
qqnorm(wide_dat$Columbia)
qqline(wide_dat$Columbia)
qqnorm(wide_dat$Florence)
qqline(wide_dat$Florence)
#Histogram
hist(wide_dat$Charleston)
hist(wide_dat$Columbia)
hist(wide_dat$Florence)
#Shapiro test
shapiro.test(wide_dat$Charleston)
shapiro.test(wide_dat$Columbia)
shapiro.test(wide_dat$Florence)
```
### Check Homogeneity of Variance Assumption
```{r Var_Assumptions, fig.height=4, fig.width=5}
#Load required packages. These might need to be installed
library(car)
#Bartlett's Test
bartlett.test(dat$rating~dat$city)
#If data were not normal then use Lavene's Test from the car package
#Bartlett's test
leveneTest(dat$rating~dat$city)
```
### Run Equal Variance ANOVA
This code chuck performs an ANOVA, displays summary data, then performs a Tukey post-hoc pairwise comparisons test.
```{r ANOVA, eval=FALSE}
res = aov(dat$rating~dat$city)
summary(res)
TukeyHSD(res)
```
```{r ANOVAres, echo=FALSE}
res = aov(dat$rating~dat$city)
summary(res)
TukeyHSD(res)
```
### Unequal Variance Welch's ANOVA
This code chuck performs a Welch's ANOVA, displays summary data, then performs a Games-Howell post-hoc pairwise comparisons test.
```{r Welch, eval=FALSE}
#Load required packages. These might need to be installed
library(rstatix)
res = oneway.test(dat$rating~dat$city, var.equal=FALSE)
res
games_howell_test(dat, rating~city)
```
```{r Welchres, echo=FALSE}
library(rstatix)
res = oneway.test(dat$rating~dat$city, var.equal=FALSE)
res
games_howell_test(dat, rating~city)
```
### Non-parametric Kruskal-Wallis Test
This code chuck performs a Kruskal-Wallis Test, displays summary data, then performs a Dunn post-hoc pairwise comparisons test. Note, the data provided on this tutorial meet assumptions of an ANOVA. The Kruskal-Wallis test is provided for your information.
```{r KW, eval=FALSE}
#Load required packages. These might need to be installed
library(FSA)
res = kruskal.test(dat$rating~dat$city)
res
dunnTest(dat$rating~dat$city)
```
```{r KWres, echo=FALSE}
#Load required packages. These might need to be installed
library(FSA)
res = kruskal.test(dat$rating~dat$city)
res
dunnTest(dat$rating~dat$city)
```