title | author | date | output | keep_md |
---|---|---|---|---|
RepR project1 |
Tikam Singh |
19 November 2016 |
html_document |
true |
library(dplyr)
library(ggplot2)
library(lubridate)
knitr::opts_chunk$set(echo = TRUE)
data<-read.csv("activity.csv")
data$date <- ymd(data$date)
data.byday <- data %>%
group_by(date) %>%
mutate(totalsteps = sum(steps, na.rm = T))
- Histogram of the total number of steps taken each day
ggplot(data = data.byday, aes(x = totalsteps)) +
geom_histogram(binwidth = 1000) +
labs(title = "Histogram: number of steps by day",
x = "Number of steps",
y = "Frequency")
meansteps <- mean(data.byday$totalsteps, na.rm = T)
mediansteps <- median(data.byday$totalsteps, na.rm = T)
- What is the average daily data pattern?
- Make a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)
data.byinterval <- group_by(data, interval) %>%
mutate(meansteps = mean(steps, na.rm = T))
ggplot(data = data.byinterval,
aes(x = interval, y = meansteps)) +
geom_line(na.rm = T) +
labs(title = "Average steps taken by interval",
x = "interval (min)",
y = "average amount of steps")
Which 5-minute interval, on average across all the days in the dataset,contains the maximum number of steps?
maxsteps <- data.byinterval[which.max(data.byinterval$meansteps), ]
The r maxsteps$interval minute interval contains the maximum number of steps (r maxsteps$meansteps) on average across all the days. Imputing missing values Calculate and report the total number of missing values in the dataset (i.e.
complete <- sum(complete.cases(data))
missing <- nrow(data) - complete
imputed <- group_by(interval, .data = data) %>%
mutate(steps = ifelse(is.na(steps), as.integer(mean(steps, na.rm = T)), steps))
imputed.byday <- group_by(date, .data = imputed) %>%
mutate(totalsteps = sum(steps))
ggplot(data = imputed.byday, aes(x = totalsteps),stat_bin(binwidth = 1000)) +
geom_histogram() +
labs(title = "Histogram: number of steps by day",
x = "Number of steps",
y = "Frequency")
imputed.meansteps <- mean(data.byday$totalsteps)
imputed.mediansteps <- median(data.byday$totalsteps)
With the missing values filled, the average amount of steps per day is r imputed.meansteps, the median amount is r imputed.mediansteps. In contrast when missing values are not taken into account the average amount of steps per day is r meansteps, the median amount is r mediansteps.
Are there differences in data patterns between weekdays and weekends? Create a new factor variable in the dataset with two levels -- "weekday" and "weekend" indicating whether a given date is a weekday or weekend day.
imputed <- imputed %>%
mutate(weekday = as.factor(
ifelse(wday(date) %in% c(1,7), "weekend", "weekday")))
Make a panel plot containing a time series plot (i.e. type = "l") of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).
imputed.byinterval <- group_by(interval, weekday, .data = imputed) %>%
mutate(meansteps = mean(steps, na.rm = T))
ggplot(data = imputed.byinterval,
aes(x = interval, y = meansteps)) +
geom_line() +
facet_grid(weekday ~ .) +
labs(title = "Average steps taken by interval",
x = "interval (min)",
y = "average amount of steps")