-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathexercises.Rmd
151 lines (101 loc) · 3.99 KB
/
exercises.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
title: "Intro to Stats in R"
author: "Rose Hartman"
date: "10/14/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
# loading tidyverse packages individually because binder is struggling
library("readr")
library("dplyr")
library("ggplot2")
library("stargazer")
library("broom")
```
# Data import and tidying
NOTE: These are fake data.
```{r data_import}
covid_testing <- read_csv("covid_testing.csv")
# test that it read in correctly
head(covid_testing)
```
Your tidying steps might be standardized in your lab, or you may be following the steps laid out in your pre-registration. Some data tidying will be the result of EDA (exploratory data analysis), such as when you discover anomalies, outliers, or errors in the data.
```{r data_tidying}
# save a new version of the data for use in our models
# only keep test results that are either positive or negative (no invalid tests), and only for pediatric patients (< 18 years old)
model_data <- covid_testing %>%
# filter to only keep rows where result is either positive or negative
filter(result %in% c("positive", "negative"),
# age is less than 18
age < 18) %>%
# calculate a total turn around time (collected to received plus received to verified)
mutate(total_tat = col_rec_tat + rec_ver_tat) %>%
# calculate a centered version of pan_day and age, if we want to use them as predictors in models with interactions
mutate(pan_day_c = pan_day - mean(pan_day),
age_c = age - mean(age)) %>%
# also turn categorical variables into factors
mutate(result = as.factor(result),
gender = as.factor(gender),
drive_thru_ind = factor(drive_thru_ind, levels = c(0,1), labels = c("no", "yes")),
clinic_name = as.factor(clinic_name),
demo_group = as.factor(demo_group)) %>%
# filter out extreme outliers, with TAT greater than 30 days, or less than 1 hour
filter(total_tat < 30*24,
total_tat > 1)
```
# Models
## Hypothesis 1
Turn around time is shorter for drive-thru tests.
We can run this as a t-test, or as ordinary least squares regression (the models are equivalent).
```{r t_test}
hyp1_t <- t.test( )
```
```{r ols_reg}
hyp1_lm <- lm( )
```
## Hypothesis 2
Probability of a positive test result increases with age, within the pediatric population.
Test with a logistic regression. Result (positive or negative) is the outcome, and age is the only predictor.
```{r logistic_reg}
hyp2 <- glm( )
```
# Examining model objects
```{r examine, eval=FALSE}
# for each saved model object, run these three lines of code in the console, one at a time
# substituting in the name you saved in place of "model_obj"
model_obj
summary(model_obj)
str(model_obj)
# you can also try plot() on model objects to see what happens!
plot(model_obj)
```
You can use functions from the broom package to turn model objects into tidy tables, which makes them easier to work with.
```{r broom, eval=FALSE}
tidy(model_obj) # a table of coefficients for the model
glance(model_obj) # a table of model fit stats
augment(model_obj) # returns your original data, but with things like model estimates and residuals added on as new columns
```
Some examples of how you might use the `augment` function to explore model results:
```{r resid_plots}
# a histogram of the model residuals
# examining residuals separately for different values of drive_thru_ind
```
```{r fitted_plot}
# plot predicted probability by age
```
# Presenting results
## Model tables
```{r, results='asis'}
# use stargazer to generate a model summary table for hyp1
# use stargazer to generate a model summary table for hyp2
```
```{r, results='asis'}
# put both models together in one table for comparison
```
## Inline stats
You can include R code and results within your text as well, which is particularly valuable when reporting stats. For example, let's say we want to report our logistic regression results.
```{r}
z <- summary(hyp2)$coefficients[2,3]
p <- summary(hyp2)$coefficients[2,4]
```