-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcovid_project.Rmd
80 lines (61 loc) · 3.39 KB
/
covid_project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: "COVID-19 effects on Australia"
output: html_document
date: "2023-01-22"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(readr)
library(dplyr)
library(stringr)
```
```{r, include=FALSE}
covid_data <- read_csv("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv")
#summary (covid_data)
```
The data being used is of COVID-19 data. This data set presents insights of: date, country, location, cases, deaths, hospitals, and mortality. Data being used for this analysis is a subset of the original COVID-19 data set. We will be looking at how COVID-19 has affected Australia over the last 2 years.
```{r,include=FALSE}
oc_data <- covid_data %>%
select('continent', 'location', 'date', 'hosp_patients','new_deaths', 'total_deaths','people_vaccinated', 'total_boosters') %>%
filter(location == "Oceania"|continent == "Oceania") %>%
mutate(location = replace(location,location == "Oceania", "unknown"))
```
```{r,include=FALSE}
oc_data$continent %>% replace_na('Oceania' )
oc_data$people_vaccinated[is.na(oc_data$people_vaccinated)] <- 0
oc_data$total_boosters[is.na(oc_data$total_boosters)] <- 0
oc_data$hosp_patients[is.na(oc_data$hosp_patients)] <- 0
oc_data$new_deaths[is.na(oc_data$new_deaths)] <- 0
#head(oc_data)
#tail(oc_data)
```
```{r, include=FALSE}
oc_start <- oc_data %>% filter(location == "Australia" & date >"2020-03-01")
#oc_start[oc_start$date >= "2020-03-01" & oc_start$date <="2022-12-31", ]
summary(oc_start)
head(oc_start)
tail(oc_start)
#oc_start$hosp_patients[is.na(oc_start$hosp_patients)] <- 0
#oc_start$new_deaths[is.na(oc_start$new_deaths)] <- 0
```
```{r, include=FALSE}
#add as project progresses
plot(total_boosters ~ new_deaths, data= oc_start)
out = lm(total_boosters ~ new_deaths, data= oc_start)
abline(out)
```
```{r}
library(ggplot2)
ggplot(data = oc_start, aes(x = date, y =new_deaths)) + geom_point()
#ggplot(data = oc_start, aes(x = date, y =hosp_patients)) + geom_point()
```
This first graph shows the increase of new COVID-19 deaths in Australia from their the first death recorded on March 1, 2020 to the end of this past year December 12, 2022. It is important to note that counts can also include probable deaths. We can see that there were two peaks on the new deaths in the year 2020. One at the start of the pandemic and then again towards the middle of the year. We can also see this occurrence happen in the years 2021 and 2022. However, from the middle of 2021 and onward, new death tolls have been at an all time high.
```{r,}
oc_2022 <- oc_data%>%
filter(location == 'Australia'& date >= "2022-01-01")
#oc_2022 <- oc_data[oc_data$date >= "2022-01-01" & oc_data$date <="2022-12-31", ]
ggplot(data = oc_2022, aes(x = date, y =new_deaths)) + geom_point()
#ggplot(data = oc_2022, aes(x = date, y = total_boosters)) + geom_point()
```
This second graph takes a closer look at the new death tolls in 2022. We can see that the new death tolls have some consistency throughout the year. However, there is an even bigger peak in the months of July through October. We then start to see this decline as we approach 2023. Through further research, Australia’s flu season is during our summer months which is their winter months. We can also see this trend in the first graph with peaks being in the middle of the year.