-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathEventsOfInterest.Rmd
1250 lines (971 loc) · 48.3 KB
/
EventsOfInterest.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Natural Disaster Sentiment Analysis"
author: "Scott Burstein"
date: "`r format(Sys.time(), '%d %B, %Y')`"
###"Data Science and Society (Sociology 367) - Final Project"
output: html_document
---
# Setup
## Configure RStudio Rmd File Output Format
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Load the Libraries and Datasets
### Google Trends Package
[gtrendsR CRAN Package](https://cran.r-project.org/web/packages/gtrendsR/gtrendsR.pdf)
used to perform and display Google Trends Queries.
Google Trends data is accessible in R through the gTrendsR package, which was
created by [Philippe Masicotte](https://github.com/PMassicotte) and made
available through [CRAN (The Comprehensive R Archive Network)](https://cran.r-project.org/).
Google trends data for specified search queries is represented by a `hits`
variable, which expresses the relative volume of Google searches for specific
terms over geographical and time parameters.
`hits` - a numeric integer between 0 and 100 representing Google's weekly search
volume as a proportion of the maximum search volume for the specified keyword
within the given time and location bounds.
It is worth noting that every Google trends query will *always* have at least
one interest_over_time `hits` value of 100 for a given query.
```{r load_packages, message = FALSE, echo = FALSE}
#---PACKAGES------ > remove "#" if package not previously installed
#install.packages("tidyverse")
library(tidyverse)
#install.packages("gtrendsR")
library(gtrendsR)
#install.packages("ggthemes")
library(ggthemes)
#install.packages("maps")
library(maps)
#install.packages("lubridate")
library(lubridate)
```
### README
See README for more thorough description of the `FEMA_Declarations` data set,
with content summary, column variable descriptions, acknowledgments, source
description and licensing information.
### Primary Data Source
[Kaggle Natural Disaster Data set](https://www.kaggle.com/headsortails/us-natural-disaster-declarations). The
`FEMA_Declarations` data set is used for the bulk of this analysis and is also
integrated with Google Trends data.
```{r load_data, message = FALSE, echo = FALSE}
#---DATA--------- > remove "#" to view raw data
FEMA_Declarations <- read_csv("FEMA_data/us_disaster_declarations.csv")
#view(FEMA_Declarations)
```
<br/><br/>
For later analysis, the `small_FEMA` data frame can be used to parse FEMA
declarations that have occurred over the last 5 years in the states that have,
in those 5 years, recorded the most FEMA declarations. It is created after
`recent_FEMA_byState`.
## Filter for Event Types
### Generalizable Parameters for Event Filter
In order to produce research that is both reproducible and timely, certain
parameters can be initialized once, and then referenced multiple times later.
Since gTrendsR can easily return Google Trends information from the past five
years, it is intuitive to create a `date_5y` variable that can be used to filter
FEMA declarations by date, thus eliminating all those that occurred previously.
The data set at large contains FEMA declarations from 1953 through the present.
Throughout this research there will also be many variables that are reused.
In portions of the analysis that copy methods used elsewhere, the assignment
operator is used to allow for minimal rewriting in the event that certain
functions are purposed for further analysis in the future.
```{r initialized_parameters, echo=FALSE}
# Start year to filter events (only show FEMA decs. since specified year)
date_5y <- Sys.Date() - 365.25*5
# Number of days after a natural disaster to query "topic" interest for in state
num_days <- 14
```
```{r clean_date, echo=FALSE}
FEMA_Declarations <- FEMA_Declarations %>%
# clean date variable
mutate(declaration_day = gsub( " .*$", "", declaration_date ))
```
---
# Research Questions Preview
**(1)** Do Google search patterns correlate with real-world events?
**(2)** Does interest in climate change increase regionally after the occurrence
of a natural disaster?
**(3)** Do people modify their Google search behavior and become more
environmentally conscious after a natural disaster?*
---
# Initial Analysis of FEMA Declarations
## All FEMA Declarations by State
```{r FEMA_Decs_by_State, echo=FALSE}
all_FEMA_decs <- FEMA_Declarations %>%
# group declarations by state and date
group_by(state, declaration_day) %>%
count() %>%
ungroup()
```
`alltime_FEMA_byState` contains the number of FEMA declarations for every US
state and territory since 1953.
```{r all_FEMA_Declarations_by_State, echo=FALSE}
alltime_FEMA_byState <- all_FEMA_decs %>%
group_by(state) %>%
count() %>%
arrange(desc(n))
# data frame w/ 59 states/territories, n count(FEMA declarations)
alltime_FEMA_byState
```
Since the database was created in 1953, the following 10 states have had the
most FEMA natural disaster emergency declarations called (as of `r Sys.Date()`):
CA, TX, OK, WA, FL, OR, NY, AZ, LA, NM
## Recent FEMA Declarations by State
The date this file was last compiled:
Current date (`Sys.Date`) is `r Sys.Date()`.
The date 5 years before, which is the starting time bound for subsequent Google
Trends queries shown below:
Variable `date_5y` is `r date_5y`.
`recent_FEMA_byState` contains the number of FEMA declarations for every US
state and territory from the last 5 years.
```{r recent_FEMA_Decs_by_State, echo=FALSE}
recent_FEMA_byState <- all_FEMA_decs %>%
filter(declaration_day >= date_5y) %>%
group_by(state) %>%
count() %>%
arrange(desc(n))
recent_FEMA_byState
```
From `r date_5y` to `r Sys.Date()`, the following 10 states have had the
most FEMA natural disaster emergency declarations called:
CA, WA, OR, OK, FL, TX, NV, AZ, LA, MS
In the past 5 years, California has recorded 75 FEMA declarations, outpacing all
other states. Much of these can be attributed to the recent spell of wildfires
which cause massive amounts of damage to property and endanger millions of
Americans each year.
Florida has issued 23 FEMA declarations in the last 5 years, many of which are
the result of tropical storms from the Atlantic that grow into hurricanes.
These also cause billions of dollars in damage, widespread power outages,
endanger the lives of local residents and also threaten to submerge the already
eroding eastern coastline of Florida.
```{r small_FEMA_df, echo=FALSE}
n_states <- 10
top_states <- recent_FEMA_byState[["state"]][1:n_states]
date_5y <- Sys.Date() - 365.25*5
small_FEMA <- FEMA_Declarations %>%
subset(state %in% top_states) %>%
mutate(incident_begin_day = gsub( " .*$", "", incident_begin_date ),
incident_end_day = gsub( " .*$", "", incident_end_date ) ) %>%
filter(declaration_day >= date_5y) %>%
select(declaration_day,
state,
designated_area,
declaration_type,
incident_type,
declaration_title,
incident_begin_day,
incident_end_day
)
# add filter() below to specify only statewide FEMA declarations
#%>% filter(designated_area == "Statewide")
#uncomment to view small_FEMA table
#view(small_FEMA)
```
---
# Key Google Trends Summary Insights:
## Climate Change Google Searches by State:
A data frame arranged in descending order of all 50 states and Washington D.C.,
showing the relative hits count for the search term "climate change" over the
last 5 years.
```{r climate_change_interest_by_state, echo=FALSE}
search_terms <- c(
"climate change"
)
climate_trends <- gtrends(keyword = search_terms,
geo = "US") # default time is past 5 years
location_df <- data.frame(climate_trends[["interest_by_region"]][["location"]])
hits_df <- data.frame(climate_trends[["interest_by_region"]][["hits"]])
names(location_df) <- "location"
names(hits_df) <- "hits"
climate_location_hits_table <- cbind(location_df$location, hits_df$hits)
climate_location_hits_table <- data.frame(climate_location_hits_table)
names(climate_location_hits_table)[1] <- "location"
names(climate_location_hits_table)[2] <- "hits"
climate_location_hits_table
```
Vermont set the mark for most Google search hits - relative search volume for
"climate change" over the past five years. Washington D.C. had 0.85 as much
search volume for the same term. The top ten regions were:
VT, DC, ME, AK, OR, MA, NH, RI, HI, CO
## Climate Change Keywords
Visualizations, such as the faceted line plots below help illustrate important
trends in sentiment towards climate change and associated topics. For example,
interest in climate change, global warming, and the green new deal topics all
spiked during the 2020 election season, as climate change became an important
topic in the Presidential race.
```{r US_climate_change-top4-query, echo=FALSE, results="Hide"}
#remove results="hide" to view glimpse() of interest_over_time
climate_search_terms <- c(
"climate change",
"global warming",
"fossil fuel",
"green new deal"
)
climate_change_gtrends <- gtrends(keyword = climate_search_terms,
geo = "US",
time = "today 12-m")
climate_change_gtrends[1]$interest_over_time
US_climate_interest <- climate_change_gtrends %>%
.$interest_over_time %>%
glimpse()
```
```{r US_climate_change-viz, echo=FALSE}
climate_change_facet_plot <- US_climate_interest %>%
mutate_at("hits", ~ifelse(. == "<1", 0.5, .)) %>% # replace with 0.5
mutate_at("hits", ~as.numeric(.)) %>% # convert to numeric
# Begin ggplot
ggplot(aes(x = date, y = hits)) +
geom_line(colour = "darkblue", size = 1.5) +
facet_wrap(~keyword) +
ggthemes::theme_economist()
climate_change_facet_plot
```
[Technical Resource Cited](https://martinctc.github.io/blog/vignette-google-trends-with-gtrendsr/)
Sentiment towards climate change can be difficult to track as individuals'
search queries may contain different keywords. The faceted plot above also shows
that volume for "climate change" searches greatly exceeds the "hits" value of
synonymous terms, like "global warming" and also tangentially related topics,
like "fossil fuel" and "green new deal". It is evident that out of the four
terms plotted above, "climate change" is the most viable as an indicator of the
general public's sentiment towards climate change. This is likely a result of it
being the most used term in news outlets and research when referring to global
changes in weather and the environment.
An interesting property of Google Trends data, which will be proven, is that in
addition to tracking human sentiment towards longstanding topics, like climate
change, it can also be used to detect one-time events
---
# Research Question 1
**Do Google search patterns correlate with real-world events?**
Causality between what happens in the real-world and Google search trends is a
prerequisite for subsequent analysis. While it may seem intuitive, it is worth
explaining the mechanisms by which society and events express themselves in the
information age. A new adaptation of the proverbial question "does a tree really
fall in the woods if no one is around to hear it?" may be "do current events
really occur without being searched for on the internet?". According to existing
social science research - the short answer is no.
The use of Google's search engine is so ingrained in the United States and
global cultures, that it is the internet which aggregates and disseminates
the vast majority of information humans use to interpret the world around them.
In order to illustrate the relationship between Google search patterns and
the occurrence of natural disasters, it is necessary to view differences in
interest over time for search queries pertaining to time-specific natural
disasters.
Each fall, from August through October, two of the most devastating weather
trends play out. Both of these phenomena lead to hundreds of deaths and billions
of dollars in damages each year. A suitable method to track social interest in
these disasters (as they happen) is to plot interest in the disaster name as a
function of time. If there were to be a relationship, a clear spike in interest
for the keyword would appear at the time of the event.
## Google Search Correlation with Florida Hurricane Events
The first of these two weather trends is tropical storm formation in the
Atlantic, which ultimately leads to hurricanes in the southeastern United
States. As global temperatures rise, the prevalence and intensity of tropical
storm systems increase as well. Florida is subject to the most intense and
frequent of these hurricanes.
```{r FL_hurricane-top4-query, echo=FALSE, results="Hide"}
search_terms <- c(
"hurricane michael",
"hurricane dorian",
"hurricane isaias",
"hurricane sally"
)
FLHurr_gtrends <- gtrends(keyword = search_terms,
geo = "US-FL",
time = "today+5-y") #default "today+5-y" otherwise "today 12-m"
#interest_over_time output can be seen here:
#FLHurr_gtrends[1]$interest_over_time
FL_hurr_interest <- FLHurr_gtrends %>%
.$interest_over_time %>% # working .$ by list subset
glimpse()
```
```{r FL_hurricane-top4-viz, echo=FALSE}
FL_hurr_plot <- FL_hurr_interest %>%
mutate_at("hits", ~ifelse(. == "<1", 0.5, .)) %>% # replace with 0.5
mutate_at("hits", ~as.numeric(.)) %>% # convert to numeric
# Begin ggplot
ggplot(aes(x = date, y = hits)) +
geom_line(colour = "darkblue", size = 1.5) +
facet_wrap(~keyword) +
ggthemes::theme_economist()
FL_hurr_plot
```
It is thus verifiable that all of these events garnered peak interest on Google
at the precise date that they occurred. The spike in Google query hits is so
sharp, it seems as though human interest in these events is short-lived and
mostly isolated at the exact moment of their occurrence. This directly confirms
the hypothesis of research question 1: Google search volume for a natural
disaster peaks precisely at the event date.
Hurricane Isaias and Hurricane Sally were two of the most devastating storms to
hit the Gulf Coast and Southeastern United States in 2020. They were classified
as Category 1 and Category 2 storms respectively, leaving more than $10.0
billion USD of damage in their wakes.
Conversely, Hurricane Dorian from 2019 and Hurricane Michael from 2018 were both
Category 5 storms, significantly more powerful and inflicting even more damage.
Hurricane Dorian is attributed to more than 100 deaths and $5.1 billion USD in
damage alone. Hurricane Michael caused 74 deaths and led to $25.5 billion USD in
damage.
Google search interest for these category 5 hurricanes showed no more than
ripples (hits ≤ 5) for the entirety of this past year. This would be unlikely if
it were the case that event interest on Google was not time dependent;
Hurricanes Dorian and Michael were significantly more consequential than
Hurricanes Isaias and Sally.
When the data is analyzed carefully however, it is evident that even search hits
for Hurricanes Dorian and Michael also increased fractionally during the 2020
hurricane season - around the time Google searches for active hurricanes spiked.
It is perhaps the case that people do become more interested in past natural
disasters in order to contextualize the events that they are currently
experiencing.
## Google Search Correlation with California Wildfire Events
The second climate trend is increased temperatures in the western U.S., which
lead to brush and forest fires in California. In 2018 alone, more than 1,670,000
acres of land burned, California's most destructive wildfire season to date.
```{r CA_fire-top4-query, echo=FALSE, results="Hide"}
CA_top4_fires <- c(
"bobcat fire",
"august fire",
"creek fire",
"bear fire"
)
CAFire_gtrends <- gtrends(keyword = CA_top4_fires,
geo = "US",
time = "today+5-y") #%>% summary()
#interest_over_time output can be seen here:
#CAFire_gtrends[1]$interest_over_time
CA_fire_interest <- CAFire_gtrends %>%
.$interest_over_time %>%
glimpse()
```
```{r CA_fire-top4-viz, echo=FALSE}
CA_fire_facet_plot <- CA_fire_interest %>%
mutate_at("hits", ~ifelse(. == "<1", 0.5, .)) %>% # replace with 0.5
mutate_at("hits", ~as.numeric(.)) %>% # convert to numeric
# Begin ggplot
ggplot(aes(x = date, y = hits)) +
geom_line(colour = "dark red", size = 1.5) +
facet_wrap(~keyword) +
ggthemes::theme_economist()
CA_fire_facet_plot
```
Like the Google trends output for Hurricanes in Florida, wildfires in California
show the same property of having peak interest on Google around the time of the
event itself.
A key difference between wildfires in California and hurricanes in Florida is
that wildfires in California tend to have a much longer duration, often lasting
weeks. This is evident by the increased time periods of certain spikes in
interest, which is even more evident when the `gtrends()` query is performed on
a shorter duration (such as the past 1 year instead of 5 years).
Another notable feature of the line plots is that there appears to be multiple
time periods that return high hits values for a given query. For example, "creek
fire" most notably refers to the largest wildfire in recent California history
that started on September 4th, 2020 and is still active at the date of
submission for this paper as of November 22nd, 2020. However, due to the naming
convention of California wildfires, it also returned Google "hits" values for
the Creek Fire of greater Los Angeles in 2017 and smaller, similarly named fires
in 2018. This makes it harder to infer when the exact peak in interest for an
event occurred, as many of the keywords refer to multiple wildfires in recent
California history.
An additional example of noise in this search query can be seen in the
"august fire" graph, where, in addition to a peak at the start of the 2020
August Complex fire (August 16th, 2020), there are regular jumps visible each
year in the month of August. This is surely not a coincidence, but rather of
conflated search queries.
The faceted line plot for interest over time for California wildfires still
confirms the hypothesis that Google search queries peak at the time of an event,
as this was still an observable pattern in the data. All four example plots
returned maximum hit values at days within the duration of each wildfire.
However, it does prove the need to tread carefully. Depending on the nature of
weather phenomena, or a weather phenomenon's naming convention, Google search
queries for associated terms may appear noisy or misleading. Additionally, it
becomes increasingly challenging to make claims like "local populations tend to/
to not search for past natural disaster events in order to contextualize
impending events" as the data is more ambiguous.
## Transition to Climate Change Sentiment Analysis
Thus, it is also necessary to consider the impact (if any) which these events
have on the general sentiment towards climate change, one of the most important
factors which dictates intensity and frequency of these events. Many news
sources and social networks address climate change and human action as a
contributing factor to the destruction caused by such natural disasters.
---
# Research Question 2
**Does interest in climate change increase regionally after the occurrence of a **
**natural disaster?**
## Context
By virtue of research question 1, it is reasonable to assume a relationship
between natural disasters - physical events that shape critical aspects of the
human experience - and Google search trends for these events. When FEMA declares
an emergency, for a hurricane, wildfire, flood, tornado or other event, people
who reside in the affected region want to know what is going on. This includes
searching the internet for more information about the nature of the emergency,
advice regarding how to protect their families and property, what to expect, and
how to prepare. The Google trends hits value associated with the disaster's name
thus fluctuates dramatically in the days leading up to, during, and after a
natural disaster impacts a community.
The next logical question to pose - at least from the perspective of this
analysis - is whether or not humans have a tendency to search for topics like
climate change after the occurrence of a local natural disaster. One of the most
noticeable and consequential results of increased global warming is that extreme
weather events, like storms, fires and floods, become more severe and can occur
more frequently. This is often used as a rallying-cry by climate activists.
Without timely action to minimize greenhouse gas emissions and protect our
natural resources, there will likely be irreversible damage to the planet.
Therefore, it is critical to assess whether individuals in regions affected by
natural disasters turn to Google in the days surrounding said event for
information regarding climate change. Does the occurrence of a natural disaster
ultimately lead to any change in internet behavior?
```{r plotting_function, echo=FALSE}
# A generalizable plotting function to display interest over time for topic vec.
plot.gtrends.silent <- function(x, ...) {
df <- x$interest_over_time
df$date <- as.Date(df$date)
df$hits <- if(typeof(df$hits) == 'character'){
as.numeric(gsub('<','',df$hits))
} else {
df$hits
}
df$legend <- paste(df$keyword, " (", df$geo, ")", sep = "")
p <- ggplot(df, aes_string(x = "date", y = "hits", color = "legend")) +
geom_line() +
xlab("Date") +
ylab("Search Hits") +
ggtitle("Interest over time") +
theme_bw() +
theme(legend.title = element_blank())
invisible(p)
}
```
## Point of Qualification
When using the gTrendsR package to analyze interest over time for “climate
change” and “fire” keywords in California from the last five years, it is
evident that climate change as a search term always receives fewer hits than
fire. This could be the resulting confluence of many principles. First of all,
some terms are just more prevalent in the public conscious than others. Fire
often poses an immediate threat to the health and safety of individuals, whereas
climate change, while still a real threat to society, is less immediate and
obvious. Additionally, it is likely that “fire” as a keyword also tabulates
search volume for search queries unrelated to wildfires, such as “How to start a
fire?”. Conversely, individuals may be quite curious for topics like climate
change and their connection to natural disasters, but use other terms in their
query, such as “what causes natural disasters?”. These are inherent flaws of the
study that must be stated. They serve as a point of qualification for any broad
claims which this research seeks to make. Without the use of sophisticated
machine learning models or fine-tuning the Google trends functionality, these
are sources of error which must be accepted.
## California Interest Over Time Graph
```{r fire_cc_insurance_plot, echo=FALSE}
fire_and_climate_change_search_terms <- c(
"fire",
"climate change",
"insurance"
)
CA_all_fire_trends <- gtrends(keyword = fire_and_climate_change_search_terms,
geo = "US-CA") #time = "today+5-y" Last five years (default)
CA_fire_plot <- plot.gtrends.silent(CA_all_fire_trends)
CA_fire_plot +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
theme(legend.position = "bottom")
```
---
Graphing Function per Dave Tang, [Visualising Google Trends Results with R](https://davetang.org/muse/2018/12/31/visualising-google-trends-results-with-r/)
---
This interesting output shows the utility of Google trends’ output when
displayed in a readable format. Interest for “insurance” is also plotted, as
damage from wildfires often leads to an increase in insurance claims. However
there does not seem to be a clear relationship on this graph between any of the
three terms “fire”, “insurance” or “climate change” in California over the past
five years.
## Florida Interest Over Time Graph
```{r hurr_cc_insurance_plot, echo=FALSE}
hurr_and_climate_change_search_terms <- c(
"hurricane",
"climate change",
"insurance"
)
FL_all_hurr_trends <- gtrends(keyword = hurr_and_climate_change_search_terms,
geo = "US-FL") #time = "today+5-y" Last five years (default)
FL_hurr_plot <- plot.gtrends.silent(FL_all_hurr_trends)
FL_hurr_plot +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
theme(legend.position = "bottom")
```
The next visualization shows the Google trends interest over time for
“hurricane”, “insurance” and “climate change” plotted relative to one another.
This time, the relative volume of “climate change” search queries *and*
“insurance” search queries is dwarfed by the number of searches for “hurricane".
There is again no clear relationship between any of these three keywords.
## Individual Keyword Queries Transformation
By transforming the data such that individual queries are processed for each of
the three search terms, rather than calling `gtrends()` once on a vector of
both "event type" and "climate change" keywords, it is possible to plot interest
over time for each search term independently of the other By doing so, each
search term will be plotted on the same proportional y-axis, where hits
represent – by definition – the proportion of search queries relative to its
own maximum search volume within the given time and location parameters. This
produces a more interesting output where anomalies in either of the search
term’s hits volume can be detected with ease, rather than having climate change
search volume consistently at hits = <1. Insurance was dropped for simplicity.
### California Interest Over Time Transformation Graph
```{r fire_yaxis_transform, echo=FALSE}
indiv_fire_trend <- gtrends(keyword = "fire",
geo = "US-CA") #time = "today+5-y" Last five years (default)
indiv_cc_trend <- gtrends(keyword = "climate change",
geo = "US-CA") #time = "today+5-y" Last five years (default)
date <- data.frame(indiv_fire_trend[["interest_over_time"]][["date"]])
fire_hits <- data.frame(indiv_fire_trend[["interest_over_time"]][["hits"]])
cc_hits <- data.frame(indiv_cc_trend[["interest_over_time"]][["hits"]])
temp_fire_df <- cbind(date, fire_hits)
temp_cc_df <- cbind(date, cc_hits)
names(temp_fire_df)[1] <- "date"
names(temp_fire_df)[2] <- "fire_hits"
names(temp_cc_df)[1] <- "date"
names(temp_cc_df)[2] <- "cc_hits"
temp_joined <- inner_join(temp_fire_df, temp_cc_df, by = NULL) #def. by="date"
#view(temp_joined)
graph_df <- temp_joined %>%
pivot_longer(!date, names_to = "type", values_to = "hits")
ggplot(data = graph_df) +
aes(x = date, y = hits, color = type) +
geom_line() +
scale_colour_viridis_d(option = "viridis") +
labs(title = "Relative Interest by Date",
subtitle = "For Climate Change and Fires in CA over the last 5 years",
x = "Date",
y = "Relative Interest") +
#scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
theme_minimal() +
theme(legend.position = 'bottom')
```
A transformed interest over time plot for wildfires and climate change in
California over the past 5 years.
### Florida Interest Over Time Transformation Graph
```{r hurr_yaxis_transform, echo=FALSE}
indiv_hurr_trend <- gtrends(keyword = "hurricane",
geo = "US-FL") #time = "today+5-y" Last five years (default)
indiv_cc_trend <- gtrends(keyword = "climate change",
geo = "US-FL") #time = "today+5-y" Last five years (default)
date <- data.frame(indiv_hurr_trend[["interest_over_time"]][["date"]])
#convert interest_over_time$hits to type integer -> error not in CA df above
hurr_hits <- data.frame(as.integer(indiv_hurr_trend[["interest_over_time"]][["hits"]]))
cc_hits <- data.frame(indiv_cc_trend[["interest_over_time"]][["hits"]])
temp_hurr_df <- cbind(date, hurr_hits)
temp_cc_df <- cbind(date, cc_hits)
names(temp_hurr_df)[1] <- "date"
names(temp_hurr_df)[2] <- "hurr_hits"
names(temp_cc_df)[1] <- "date"
names(temp_cc_df)[2] <- "cc_hits"
temp_joined <- inner_join(temp_hurr_df, temp_cc_df, by = NULL) #def. by="date"
#view(temp_joined)
graph_df <- temp_joined %>%
pivot_longer(!date, names_to = "type", values_to = "hits")
ggplot(data = graph_df) +
aes(x = date, y = hits, color = type) +
geom_line() +
scale_color_brewer(palette = "Dark2") +
labs(title = "Relative Interest by Date",
subtitle = "For Climate Change and Hurricanes in FL over the last 5 years",
x = "Date",
y = "Relative Interest") +
theme_minimal() +
theme(legend.position = 'bottom')
```
A transformed interest over time plot for hurricanes and climate change in
Florida over the past 5 years.
---
Analytic Methodology per Surbhi Tyagi, [Getting Google Trends Data Using gtrendsR](http://rstudio-pubs-static.s3.amazonaws.com/520533_f25fa191afa7476a995322a4393710ce.html)
---
# Combining FEMA data with Google Trends Hits Queries
By looking at the Google Trends change in search volume for climate change
immediately after a natural disaster takes place, it is plausible to make
conclusions about the presence (or absence) of fluctuations in local interest
for climate change as a result of the natural disaster.
---
### Null Hypothesis:
$H_o: \mu_c = 0$
The change in climate change hits immediately after a natural disaster will not
change. Thus, the day-to-day rate of change for climate change "hits" will be 0.
If there is no change in sentiment, then a slope of 0 would be expected for the
daily change in hits for climate change Google searches immediately after a
natural disaster takes place.
### Alternative Hypothesis:
$H_a: \mu_c \neq 0$
The change in climate change hits immediately after a natural disaster will
change. Thus, the day-to-day rate of change for climate change "hits" will not
equal 0.
If there is a change in sentiment, then a slope not equal to 0 (either positive
or negative) would be expected for the daily change in hits for climate change
Google searches immediately after a natural disaster takes place.
---
## Florida Hurricanes since `r date_5y`
A list of all FEMA declarations for hurricanes in Florida occurring in the last
5 years.
```{r FL_Hurricanes-events, echo=FALSE}
FL_Hurricanes <- FEMA_Declarations %>%
filter(incident_type == "Hurricane", incident_begin_date >= date_5y, state == "FL") %>%
select(declaration_title, declaration_day) %>%
distinct(declaration_title, .keep_all = TRUE)
FL_Hurricanes
```
##Climate Change Sentiment after Hurricanes in Florida
shows the change in search volume for "climate change" during the 14 days after
each Florida hurricane in the `FEMA_Declarations` dataset.
```{r FL_Hurricanes-query, echo=FALSE}
# set parameters for FL_Hurricanes query below
num_days <- 14
State_Disaster <- FL_Hurricanes
location <- "US-FL"
search_term <- "climate change"
# vectorize events & dates
events <- as.vector(State_Disaster$declaration_title)
dates <- as.vector(State_Disaster$declaration_day)
# create vector of length length(events)*num_days with event names for cbind()
EventsVec <- vector()
for (j in 1:length(events)) {
event <- events[j]
for(k in 1:(num_days+1)) {
EventsVec <- append(EventsVec, event)
}
}
# initialize empty data frame
finaldata <- as.data.frame(NULL)
# for loop iterates over all events in events vector
for (i in 1:length(events)) {
# create comb_time time boundaries for gtrends query
start_date <- ymd(dates[i])
end_date <- ymd(dates[i]) + num_days
comb_time <- paste(start_date, end_date)
# execute gtrends query for keyword = list_of_weather_events[i]
#
event_trend <- gtrends(keyword = search_term, #events[i]
geo = location, time = comb_time)
# weather_event_gtrends[1] object tracks interest_over_time
all_time_interest <- as.data.frame(event_trend[1])
finaldata <- rbind(finaldata, all_time_interest)
print(i)
Sys.sleep(5)
}
# column bind df with vector of natural disaster names (EventsVec)
finaldata <- cbind(event_name=EventsVec, finaldata)
# set finaldata pointer to FLHurr_Hits (specific variable)
FLHurr_Hits <- finaldata
# view data frame
#FLHurr_Hits
```
```{r FL_Hurricanes-hits-df, echo=FALSE}
# set df as FLHurr_Hits for subsequent df transformation
df <- FLHurr_Hits
num_days <- 14
x <- df %>%
mutate(event_date = gsub( " .*$", "", interest_over_time.time ),
hits_date = gsub( " .*$", "", interest_over_time.date )
)
x <- x %>%
mutate(event_date = as.Date(event_date),
hits_date = as.Date(hits_date),
keyword = interest_over_time.keyword,
hits = interest_over_time.hits) %>%
select(event_name, event_date, keyword, hits_date, hits)
relative_date_mod <- x %>%
mutate(rel_date = hits_date - event_date)
# ggplot ready df (pivot_longer effectively done)
FLHurr_graph_df <- relative_date_mod
```
```{r unused-coalesce-fxn, eval=FALSE, echo=FALSE}
# method to get 1 row for each event
hits_df <- relative_date_mod %>%
pivot_wider(names_from = rel_date, values_from = hits)
coalesce_by_column <- function(df) {
return(coalesce(df[1:(num_days)]))
}
hits_df %>%
group_by(event_name) %>% #event_name
summarise_all(coalesce_by_column)
FLHurr_output_df <- hits_df
FLHurr_output_df
`coalesce_by_column` function [description and source](https://stackoverflow.com/questions/45515218/combine-rows-in-data-frame-containing-na-to-make-complete-row)
```
```{r FLHurr_CCInterest-plot, echo=FALSE, warning=FALSE, message=FALSE}
ggplot(data = FLHurr_graph_df, aes(x = rel_date, y = hits, color = event_name)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "FL Hurricanes Affects on Climate Change Sentiment",
subtitle = "Google Trends after FEMA Declaration",
x = "Number of Days after FEMA Declaration",
y = "Google Trends Hits Volume",
color = "Natural Disaster") +
theme(legend.position = "right")
```
This visual shows the Google trends hits volume as a function of days after
a natural disaster takes place, grouped by natural disaster. It is hard to
extrapolate from this, as there are 9 events represented on the same graph,
and the colors make it difficult to read.
A more intuitive method is to create one linear model that *ignores* the
grouping by event, as the mean of all the slopes is an even more meaningful
indicator of patterns in human behavior across different natural disasters.
```{r FLHurr_CCMono-plot, echo = FALSE, warning=FALSE, message=FALSE}
xticks <- 0:num_days
ggplot(data = FLHurr_graph_df, aes(x = rel_date, y = hits)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "FL Hurricanes Affects on Climate Change Sentiment",
subtitle = "Google Trends after FEMA Declaration",
y = "Google Trends Hits Volume") +
scale_x_continuous("Number of Days after FEMA Declaration", labels = as.character(xticks), breaks = xticks) +
theme_bw()
```
Here, one linear model is visible, representing the average daily change in
search volume for climate change in the two weeks following every Florida
hurricane from the past 5 years. The null hypothesis is that it equals 0,
because assuming there is no relationship between a natural disaster and climate
change sentiment, there should be no change in daily search volume for climate
change after a hurricane occurs. It appears to be slightly negative, but more
analysis is necessary.
```{r FLHurr_CC-significance, echo = FALSE}
#State_Disaster <- FL_Hurricanes
# add a grouping variable (or many!)
FLHurr_slopes <- FLHurr_graph_df %>%
#mutate(group5 = rep(1:10, each = 5)) %>%
group_by(event_name) %>%
mutate(
slope = round(lm(hits ~ rel_date)$coefficients[2], 2),
significance = summary(lm(hits ~ rel_date))$coefficients[2, 4],
x = mean(rel_date), # x coordinate for slope label
y = mean(hits) # y coordinate for slope label
)
hurr_slopes_output <- FLHurr_slopes %>%
select(event_name, event_date, slope, significance) %>%
distinct() %>%
filter(significance > .2)
hurr_slopes_output
```
Methodology from reprex package [Reference](https://stackoverflow.com/questions/51355303/extract-slope-of-multiple-trend-lines-from-geom-smooth)
7 of the 9 hurricanes have linear models with significance levels greater than
0.2. These represent the slopes that are moderate in value, by filtering out
all the slopes that were far too extreme to be realistic.
Next, it is possible to conduct a t-test to determine a p-value for the null
hypothesis that the change in climate change hits immediately after a hurricane
disaster will not change.
```{r FLHurr_CC-ttest, echo = FALSE}
t.test(hurr_slopes_output$slope,mu=0)
```
The p-value = 0.6881, which is greater than the significance level of 0.05,
meaning that we fail to reject the null hypothesis that the change in climate
change hits immediately after a hurricane in Florida will not change. Thus,
there appears to be insufficient evidence to make a causal claim about the
relationship between hurricanes in Florida and local Google trends sentiment for
climate change.
---
## California Wildfires since `r date_5y`
A list of all FEMA declarations for wildfires in California occurring in the
last 5 years.
```{r CA_Fires-events, echo=FALSE}
CA_Fires <- FEMA_Declarations %>%
filter(incident_type == "Fire", incident_begin_date >= date_5y, state == "CA") %>%
select(declaration_title, declaration_day) %>%
distinct(declaration_title, .keep_all = TRUE)
CA_Fires
```
```{r CA_Fires-query, echo=FALSE}
# set parameters for CA_Fires query below
num_days <- 14
State_Disaster <- CA_Fires
location <- "US-CA"
search_term <- "climate change"
# vectorize events & dates
events <- as.vector(State_Disaster$declaration_title)
dates <- as.vector(State_Disaster$declaration_day)
# create vector of length length(events)*num_days with event names for cbind()