forked from PMacDaSci/r-intermediate
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path3.workflows-exercises.Rmd
62 lines (44 loc) · 1.94 KB
/
3.workflows-exercises.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: "Title"
author: "Name"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output: html_document
---
******
Take the steps used to clean the patients dataset and calculate BMI (see below for the code)
- Re-write in the piping framework
******
```{r message = FALSE}
library(dplyr)
library(stringr)
patients <- read.delim("patient-data.txt")
patients <- tbl_df(patients)
patients_clean <- mutate(patients, Sex = factor(str_trim(Sex)))
patients_clean <- mutate(patients_clean, Height= as.numeric(str_replace_all(patients_clean$Height, pattern = "cm", "")))
patients_clean <- mutate(patients_clean, Weight = as.numeric(str_replace_all(patients_clean$Weight, "kg", "")))
patients_clean <- mutate(patients_clean, BMI = (Weight/(Height/100)^2), Overweight = BMI > 25)
patients_clean <- mutate(patients_clean, Smokes = str_replace_all(Smokes, "Yes", "TRUE"))
patients_clean <- mutate(patients_clean, Smokes = as.logical(str_replace_all(Smokes, "No", "FALSE")))
```
```{r}
## Re-write the above template using 'pipes'
```
## Exercise: filter
Use `filter` to print the following subsets of the dataset
- Choose the Female patients from New York or New Jersey
- Choose the overweight smokers that are still alive
- Choose the patients who own a Pet that is not a dog
+ recall that `is.na` can check for values that are `NA`
- (OPTIONAL)
- Patients born in June
+ you could check out the notes on dealing with dates in the previous section for this
- Patients with a Number > 100
+ Patient Number is the third component of the ID variable
+ e.g. `AC/AH/001` is patient Number 1; `AC/AH/017` is patient Number 17
- Patients that entered the study on 2016-05-31
+ the column containing this data is incomplete; blank entries are assumed to be the same as the last non-blank entry above
+ the function `fill` from `tidyr` will help here
Feel free to experiment with different ways to do these
```{r}
### Your answer here
```