Skip to content

Commit

Permalink
add, edit, some minor formatting changes
Browse files Browse the repository at this point in the history
  • Loading branch information
MonkmanMH committed Feb 6, 2019
1 parent 2b05dd8 commit 4b1f625
Show file tree
Hide file tree
Showing 5 changed files with 197 additions and 1 deletion.
File renamed without changes.
File renamed without changes.
153 changes: 153 additions & 0 deletions 10_data_sources.rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
# Data Sources & How to Read Them {#datasources}


```{r echo = FALSE}
library(knitr)
opts_chunk$set(message = FALSE, warning = FALSE, cache = TRUE)
options(width = 100, dplyr.width = 100)
library(ggplot2)
theme_set(theme_light())
```




## Introduction

What is data science without _data_? Links to tools to import data from a variety of sources, along with a few indexes and compendiums of data sources.

---
### Sources

#### listings

University of Alberta Libraries, Economics: [List of databases](http://guides.library.ualberta.ca/c.php?g=329741&p=2334221)

Simon Fraser University Library: [Gender, Sexuality & Women's Studies Information Resources: Facts & Data](http://www.lib.sfu.ca/help/research-assistance/subject/gsws/factsdata)

#### open data sources

[United Nations Population Prospects](https://esa.un.org/unpd/wpp/) - detailed country population data

* [populationpyramid.net](https://www.populationpyramid.net/) uses this data

[OECD world data, by country](https://data.oecd.org/)

[Gapminder](https://www.gapminder.org/data/) - all indicators displayed in Gapminder World

---

### R packages


##### `cancensus`

[Census of Canada (including the National Household Survey)](https://github.com/mountainMath/cancensus)



#### `cansim`

**package**

[github](https://github.com/mountainMath/cansim)

**articles**

Dmitry Shkolnik (2018-08-01) [The CANSIM package, Canadian tourism, and slopegraphs](https://www.dshkol.com/2018/cansim-package-tourism-slopegraphs/)


#### `CANSIM2R`

[CANSIM2R: Directly Extracts Complete CANSIM Data Tables](https://cran.r-project.org/web/packages/CANSIM2R/index.html)

github: [CANSIM2R](https://github.com/MarcoLugo/CANSIM2R)

* Andrew Clarke (2017-08-09) [StatCan API's Discovered](https://www.mytinyshinys.com/2017/08/09/statcanapi/)


##### `gapminder`

[gapminder: Data from Gapminder](https://cran.r-project.org/web/packages/gapminder/index.html) An excerpt of the data available at [Gapminder.org]. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.


##### `Lahman`

[Lahman: Sean 'Lahman' Baseball Database](https://cran.r-project.org/web/packages/Lahman/) Provides the tables from the 'Sean Lahman Baseball Database' as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2015, as recorded in the 2016 version of the database.


---
### R readers

**articles**

[R database interfaces](http://www.burns-stat.com/r-database-interfaces/)


#### `rio`

**package**

CRAN page: _currently only development version_, see tidyverse link below

vignette: [Import, Export, and Convert Data Files](https://cran.r-project.org/web/packages/rio/vignettes/rio.html)



#### `googledrive`

**package**

CRAN page: _currently only development version_, see tidyverse link below

tidyverse page: [`googledrive`](https://tidyverse.github.io/googledrive/)



#### `foreign`

**package**

CRAN page: [foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, ...]( https://CRAN.R-project.org/package=foreign)

**articles**

* [How to open an SPSS file into R](http://www.milanor.net/blog/how-to-open-an-spss-file-into-r/), by Davide Massidda (2014-03-26)



#### Stata files

**package `read.dta`**

Reads a file in Stata version 5–12 binary format into a data frame.

CRAN page: [`read.dta`: Read Stata Binary Files](http://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html)


**package readstata13**

Function to read and write the 'Stata' file format.

CRAN Page: [readstata13: Import 'Stata' Data Files](readstata13: Import 'Stata' Data Files)





#### `TSdbi` and related packages

**package**

CRAN page: [TSdbi: Time Series Database Interface]( https://CRAN.R-project.org/package=TSdbi)

Note: `TSdbi` has some related extension packages:

* CRAN page: [TSdata: 'TSdbi' Illustration](https://cran.r-project.org/web/packages/TSdata/index.html)
* This package gives an overview and usage examples for all the `TSdbi` family of packages

* CRAN page: [TSPostgreSQL: 'TSdbi' Extensions for 'PostgreSQL'](https://cran.r-project.org/web/packages/TSPostgreSQL/index.html)

* CRAN page: [TSsdmx: 'TSdbi' Extension to Connect with 'SDMX'](https://cran.r-project.org/web/packages/TSsdmx/index.html)


45 changes: 44 additions & 1 deletion 41_chart_types.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ theme_set(theme_light())
Naomi Robbins (2013), _Creating More Effective Graphs_, Chart House.


---

### Box plots (a way to visualize distributions)

R package [`boxplot`](https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/boxplot.html)
Expand All @@ -28,11 +30,17 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>



---

### Density plot

* Jodie Burchell, 2016-03-16, [Creating plots in R using ggplot2 - part 8: density plots](http://t-redactyl.io/blog/2016/03/creating-plots-in-r-using-ggplot2-part-8-density-plots.html)



---

### Dot plot (Cleveland dot plot, lollipop plot)

* UC Business Analytics R Programming Guide, [Cleveland Dot Plots](https://uc-r.github.io/cleveland-dot-plots)
Expand All @@ -42,6 +50,9 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers
* [Datavis with R: Drawing a Cleveland dot plot with ggplot2](http://www.joyce-robbins.com/blog/2016/06/02/datavis-with-rdrawing-a-cleveland-dot-plot-with-ggplot2/)



---

### Eikosograms

> an eikosogram is a picture of probability. It visually partitions a unit square into rectangular regions whose areas give the numerical values of various probabilities. The construction is such that each rectangular region is identified with the value of one or more categorical variates.
Expand All @@ -50,6 +61,9 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers
* R.W. Oldford (2018-08-16) [Introduction to eikosograms](https://cran.r-project.org/web/packages/eikosograms/vignettes/introduction.html)



---

### Flow visualizations

**1. Circle plots**
Expand All @@ -65,29 +79,38 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers
[How to Make a D3 Sankey diagram in R](http://emapr.ceoas.oregonstate.edu/pages/education/how_to/sankey_diagram/sankey_diagram_to_visualize_landcover_change.html)


---

### Heatmaps


[The Heatmap function](https://www.r-graph-gallery.com/215-the-heatmap-function/) in the [R Graph Gallery](https://www.r-graph-gallery.com/)

Rebecca L. Barter & Bin Yu, 2017-01-30, ["Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complext data"](https://arxiv.org/pdf/1512.01524.pdf)


---

### Histograms and their variants


[Variable width column charts](https://learnr.wordpress.com/2009/03/29/ggplot2_marimekko_mosaic_chart/) (in ggplot2)

[Mosaic or Marimekko charts](https://learnr.wordpress.com/2009/03/29/ggplot2_marimekko_mosaic_chart/) (in ggplot2)

Aran Lunzer and Amelia McNamara, [What's so hard about histograms?](http://tinlizzie.org/histograms/)


---

### Lexis diagrams

Tim RiffeEmail author, Jonas Schöley and Francisco Villavicencio (2017) ["A unified framework of demographic time"](http://genus.springeropen.com/articles/10.1186/s41118-017-0024-4), _Genus: Journal of Population Sciences_, 2017 73:7



---

### Network graphs

[DiagramR: Graph and network visualization using tabular data in R](DiagrammeR: Graph/Network Visualization)
Expand All @@ -100,6 +123,7 @@ Tim RiffeEmail author, Jonas Schöley and Francisco Villavicencio (2017) ["A uni
[ggnet2: network visualization with ggplot2](https://briatte.github.io/ggnet/) -- part of the [`GGally`](https://www.rdocumentation.org/packages/GGally/versions/1.3.2) package


---

### Population Pyramids

Expand Down Expand Up @@ -148,6 +172,7 @@ Ilya Kashnitsky, 2017-03-31, ["Who is old? Visualizing the concept of prospectiv
acarioli (2016-01-11) [Population pyramids in ggplot](https://aledemogr.wordpress.com/2016/01/11/population-pyramids-in-ggplot/)


---

### Ridgeline plot

Expand Down Expand Up @@ -178,18 +203,36 @@ The over of Joy Division's debut album [_Unknown Pleasures_](https://en.wikipedi



---

### Slopegraphs

[Slopegraphs get their own page](Data_Visualization_Slopegraphs.md)

A common visualization to show relative change between two time periods across different categories.

#### Theory and methods

Cole Nussbaumer Knaflic, 2015, _Storytelling with Data_, pp.47-49.

#### R

Kyle Walker, 2015-05-17, [Global population change with a slopegraph in ggplot2](https://rpubs.com/walkerke/slopegraph)


#### `slopegraph`

[github](https://github.com/leeper/slopegraph)



---

### Ternary plots

[`ggtern` - an extension to `ggplot2`](http://www.ggtern.com/) for plotting ternary diagrams.


---

### Waffle plots

Expand Down
File renamed without changes.

0 comments on commit 4b1f625

Please sign in to comment.