This repository has been archived by the owner on May 10, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
134 lines (96 loc) · 4.33 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# datagovau
R package for accessing data.gov.au.
## Intro
There are many high quality data sets that are freely available for Australia. Unfortunately they can be difficult to obtain and analyse. Here we provide tools to programmatically import and explore Australian data sets. Data can be obtained from the official Australian government portal, which catalogues over 40,000 data sets
(<https://data.gov.au>).
data.gov.au does not itself host the data but contains links to the datasets on agencies' own sites. The data is in a wide range of formats. Currently the `datagovau` package can only successfully import the following data types:
- tabular data in CSV or Excel format (either as a single table or a zipped up collection of them)
- shapefiles
This project
- started as part of the `ozdata` package from the [2017 BURGr R UnConference](https://github.com/AU-BURGr/UnConf2017)
- focused on just data.gov.au and getting CRAN-ready at the [2017 rOpenSci OzUnconf](http://ozunconf17.ropensci.org/)
## Installation
```R
devtools::install_github("ropenscilabs/datagovau/pkg")
```
## Basic usage
```{r}
library(dplyr)
library(datagovau)
# download details of datasets with 'water' in their name:
res <- search_data("name:water", limit = 20)
# download the datasets in the second pacakge listed their:
water_data <- res %>% filter(can_use == "yes") %>% slice(2) %>% get_data
# look at the first rectangle of data (at time of writing there were four such rectangles)
head(water_data[[1]])
```
## Trees!
data.gov.au lists a range of datasets with "trees" in their name. These seem to nearly all be pinpointed locations of trees in particular areas.
### Combination with mapview
When `get_data` imports a shapefile, the resulting object is an object of class SpatialPointsDataFrame (from the `sp` package). The `mapview` package provides nice interactive leaflet graphics (in the output below, only a screenshot provided):
```{r cache = TRUE}
library(mapview)
trees_md <- search_data("name:trees", limit = 1000)
dim(trees_md) # 87 datasets about trees
# choose one of the datasets that happens to be a shapefile:
burnside <- trees_md %>%
filter(name == 'Burnside Trees - Shapefile') %>%
get_data()
# draw a map:
mapView(burnside)@map
```
### Combination with ggmap
Of course, data that comes in a tabular form (eg CSV) can be analysed with any of the tools aimed at handling tabular data, such as ggplot2 and its friends:
```{r warning = FALSE, fig.width = 10, dpi = 300, message = FALSE}
library(ggmap)
library(viridis)
library(scales)
#---------------------facet plot of burnside trees by species---------------
# download data
burnside2 <- trees_md %>%
filter(name == "Burnside Trees - CSV") %>%
get_data()
# get the background map
m <- get_map(location = c(mean(burnside2$X), mean(burnside2$Y)),
source = "stamen",
maptype = "toner",
zoom = 13)
# tidy up our data:
heights <- c("less than 5m", "5m", "5-10m", "10-20m", "greater than 20m")
d <- burnside2 %>%
mutate(type = fct_lump(BotanicalN, 8),
height = factor(TreeHeight, levels = heights))
# draw map
ggmap(m) +
geom_point(data = d, aes(x = X, y = Y, colour = height)) +
facet_wrap(~type) +
scale_colour_viridis(discrete = TRUE) +
ggtitle("Burnside trees") +
coord_map() +
theme_map() +
theme(legend.position = "right")
```
## Income
Over 200 datasets are listed with "Income" in their name
```{r fig.width = 10, dpi = 300, message = FALSE}
income_md <- search_data("name:income", limit = 1000)
# 269 datasets about income
dim(income_md)
movers <- income_md %>%
filter(name == "Recent Movers Household Income") %>%
get_data()
movers %>%
mutate(Income = factor(Income, levels = c(
"One or more incomes not stated",
"Very low income", "Low income", "Moderate income", "High income"))) %>%
rename(LGA = `LGA Name`) %>%
filter(LGA != "South Australia") %>%
ggplot(aes(weight = Households, fill = Income, x = LGA)) +
geom_bar(position = "fill") +
scale_fill_brewer() +
coord_flip() +
scale_y_continuous("Percentage of households in each category", label = percent) +
labs(x = "Local Government Area") +
ggtitle("Recent Movers' Household Income in South Australia",
"Household income for households who had a different address in the 2011 Census compared to the 2006 Census")
```