From 4b1f6254d113d55ab296ee11f0e61f7d5f0274da Mon Sep 17 00:00:00 2001 From: Martin Monkman Date: Tue, 5 Feb 2019 20:41:18 -0800 Subject: [PATCH] add, edit, some minor formatting changes --- ...Defined.Rmd => 02_data_science_defined.Rmd | 0 ...actice.Rmd => 03_data_science_practice.Rmd | 0 10_data_sources.rmd | 153 ++++++++++++++++++ 41_chart_types.Rmd | 45 +++++- 60_TextAnalysis.rmd => 60_text_analysis.rmd | 0 5 files changed, 197 insertions(+), 1 deletion(-) rename 02_DataScienceDefined.Rmd => 02_data_science_defined.Rmd (100%) rename 03_StatisticalPractice.Rmd => 03_data_science_practice.Rmd (100%) create mode 100644 10_data_sources.rmd rename 60_TextAnalysis.rmd => 60_text_analysis.rmd (100%) diff --git a/02_DataScienceDefined.Rmd b/02_data_science_defined.Rmd similarity index 100% rename from 02_DataScienceDefined.Rmd rename to 02_data_science_defined.Rmd diff --git a/03_StatisticalPractice.Rmd b/03_data_science_practice.Rmd similarity index 100% rename from 03_StatisticalPractice.Rmd rename to 03_data_science_practice.Rmd diff --git a/10_data_sources.rmd b/10_data_sources.rmd new file mode 100644 index 0000000..e6eab97 --- /dev/null +++ b/10_data_sources.rmd @@ -0,0 +1,153 @@ +# Data Sources & How to Read Them {#datasources} + + +```{r echo = FALSE} +library(knitr) +opts_chunk$set(message = FALSE, warning = FALSE, cache = TRUE) +options(width = 100, dplyr.width = 100) +library(ggplot2) +theme_set(theme_light()) +``` + + + + +## Introduction + +What is data science without _data_? Links to tools to import data from a variety of sources, along with a few indexes and compendiums of data sources. + +--- +### Sources + +#### listings + +University of Alberta Libraries, Economics: [List of databases](http://guides.library.ualberta.ca/c.php?g=329741&p=2334221) + +Simon Fraser University Library: [Gender, Sexuality & Women's Studies Information Resources: Facts & Data](http://www.lib.sfu.ca/help/research-assistance/subject/gsws/factsdata) + +#### open data sources + +[United Nations Population Prospects](https://esa.un.org/unpd/wpp/) - detailed country population data + +* [populationpyramid.net](https://www.populationpyramid.net/) uses this data + +[OECD world data, by country](https://data.oecd.org/) + +[Gapminder](https://www.gapminder.org/data/) - all indicators displayed in Gapminder World + +--- + +### R packages + + +##### `cancensus` + +[Census of Canada (including the National Household Survey)](https://github.com/mountainMath/cancensus) + + + +#### `cansim` + +**package** + +[github](https://github.com/mountainMath/cansim) + +**articles** + +Dmitry Shkolnik (2018-08-01) [The CANSIM package, Canadian tourism, and slopegraphs](https://www.dshkol.com/2018/cansim-package-tourism-slopegraphs/) + + +#### `CANSIM2R` + +[CANSIM2R: Directly Extracts Complete CANSIM Data Tables](https://cran.r-project.org/web/packages/CANSIM2R/index.html) + +github: [CANSIM2R](https://github.com/MarcoLugo/CANSIM2R) + +* Andrew Clarke (2017-08-09) [StatCan API's Discovered](https://www.mytinyshinys.com/2017/08/09/statcanapi/) + + +##### `gapminder` + +[gapminder: Data from Gapminder](https://cran.r-project.org/web/packages/gapminder/index.html) An excerpt of the data available at [Gapminder.org]. For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007. + + +##### `Lahman` + +[Lahman: Sean 'Lahman' Baseball Database](https://cran.r-project.org/web/packages/Lahman/) Provides the tables from the 'Sean Lahman Baseball Database' as a set of R data.frames. It uses the data on pitching, hitting and fielding performance and other tables from 1871 through 2015, as recorded in the 2016 version of the database. + + +--- +### R readers + +**articles** + +[R database interfaces](http://www.burns-stat.com/r-database-interfaces/) + + +#### `rio` + +**package** + +CRAN page: _currently only development version_, see tidyverse link below + +vignette: [Import, Export, and Convert Data Files](https://cran.r-project.org/web/packages/rio/vignettes/rio.html) + + + +#### `googledrive` + +**package** + +CRAN page: _currently only development version_, see tidyverse link below + +tidyverse page: [`googledrive`](https://tidyverse.github.io/googledrive/) + + + +#### `foreign` + +**package** + +CRAN page: [foreign: Read Data Stored by Minitab, S, SAS, SPSS, Stata, Systat, Weka, dBase, ...]( https://CRAN.R-project.org/package=foreign) + +**articles** + +* [How to open an SPSS file into R](http://www.milanor.net/blog/how-to-open-an-spss-file-into-r/), by Davide Massidda (2014-03-26) + + + +#### Stata files + +**package `read.dta`** + +Reads a file in Stata version 5–12 binary format into a data frame. + +CRAN page: [`read.dta`: Read Stata Binary Files](http://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html) + + +**package readstata13** + +Function to read and write the 'Stata' file format. + +CRAN Page: [readstata13: Import 'Stata' Data Files](readstata13: Import 'Stata' Data Files) + + + + + +#### `TSdbi` and related packages + +**package** + +CRAN page: [TSdbi: Time Series Database Interface]( https://CRAN.R-project.org/package=TSdbi) + +Note: `TSdbi` has some related extension packages: + +* CRAN page: [TSdata: 'TSdbi' Illustration](https://cran.r-project.org/web/packages/TSdata/index.html) +* This package gives an overview and usage examples for all the `TSdbi` family of packages + +* CRAN page: [TSPostgreSQL: 'TSdbi' Extensions for 'PostgreSQL'](https://cran.r-project.org/web/packages/TSPostgreSQL/index.html) + +* CRAN page: [TSsdmx: 'TSdbi' Extension to Connect with 'SDMX'](https://cran.r-project.org/web/packages/TSsdmx/index.html) + + diff --git a/41_chart_types.Rmd b/41_chart_types.Rmd index 142bec7..1c506b2 100644 --- a/41_chart_types.Rmd +++ b/41_chart_types.Rmd @@ -15,6 +15,8 @@ theme_set(theme_light()) Naomi Robbins (2013), _Creating More Effective Graphs_, Chart House. +--- + ### Box plots (a way to visualize distributions) R package [`boxplot`](https://stat.ethz.ch/R-manual/R-devel/library/graphics/html/boxplot.html) @@ -28,11 +30,17 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers + +--- + ### Density plot * Jodie Burchell, 2016-03-16, [Creating plots in R using ggplot2 - part 8: density plots](http://t-redactyl.io/blog/2016/03/creating-plots-in-r-using-ggplot2-part-8-density-plots.html) + +--- + ### Dot plot (Cleveland dot plot, lollipop plot) * UC Business Analytics R Programming Guide, [Cleveland Dot Plots](https://uc-r.github.io/cleveland-dot-plots) @@ -42,6 +50,9 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers * [Datavis with R: Drawing a Cleveland dot plot with ggplot2](http://www.joyce-robbins.com/blog/2016/06/02/datavis-with-rdrawing-a-cleveland-dot-plot-with-ggplot2/) + +--- + ### Eikosograms > an eikosogram is a picture of probability. It visually partitions a unit square into rectangular regions whose areas give the numerical values of various probabilities. The construction is such that each rectangular region is identified with the value of one or more categorical variates. @@ -50,6 +61,9 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers * R.W. Oldford (2018-08-16) [Introduction to eikosograms](https://cran.r-project.org/web/packages/eikosograms/vignettes/introduction.html) + +--- + ### Flow visualizations **1. Circle plots** @@ -65,17 +79,21 @@ Ron Pearson, 2011-01-29, [Boxplots and Beyond – Part I](https://www.r-bloggers [How to Make a D3 Sankey diagram in R](http://emapr.ceoas.oregonstate.edu/pages/education/how_to/sankey_diagram/sankey_diagram_to_visualize_landcover_change.html) +--- ### Heatmaps + [The Heatmap function](https://www.r-graph-gallery.com/215-the-heatmap-function/) in the [R Graph Gallery](https://www.r-graph-gallery.com/) Rebecca L. Barter & Bin Yu, 2017-01-30, ["Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complext data"](https://arxiv.org/pdf/1512.01524.pdf) +--- ### Histograms and their variants + [Variable width column charts](https://learnr.wordpress.com/2009/03/29/ggplot2_marimekko_mosaic_chart/) (in ggplot2) [Mosaic or Marimekko charts](https://learnr.wordpress.com/2009/03/29/ggplot2_marimekko_mosaic_chart/) (in ggplot2) @@ -83,11 +101,16 @@ Rebecca L. Barter & Bin Yu, 2017-01-30, ["Superheat: An R package for creating b Aran Lunzer and Amelia McNamara, [What's so hard about histograms?](http://tinlizzie.org/histograms/) +--- + ### Lexis diagrams Tim RiffeEmail author, Jonas Schöley and Francisco Villavicencio (2017) ["A unified framework of demographic time"](http://genus.springeropen.com/articles/10.1186/s41118-017-0024-4), _Genus: Journal of Population Sciences_, 2017 73:7 + +--- + ### Network graphs [DiagramR: Graph and network visualization using tabular data in R](DiagrammeR: Graph/Network Visualization) @@ -100,6 +123,7 @@ Tim RiffeEmail author, Jonas Schöley and Francisco Villavicencio (2017) ["A uni [ggnet2: network visualization with ggplot2](https://briatte.github.io/ggnet/) -- part of the [`GGally`](https://www.rdocumentation.org/packages/GGally/versions/1.3.2) package +--- ### Population Pyramids @@ -148,6 +172,7 @@ Ilya Kashnitsky, 2017-03-31, ["Who is old? Visualizing the concept of prospectiv acarioli (2016-01-11) [Population pyramids in ggplot](https://aledemogr.wordpress.com/2016/01/11/population-pyramids-in-ggplot/) +--- ### Ridgeline plot @@ -178,18 +203,36 @@ The over of Joy Division's debut album [_Unknown Pleasures_](https://en.wikipedi +--- ### Slopegraphs -[Slopegraphs get their own page](Data_Visualization_Slopegraphs.md) + +A common visualization to show relative change between two time periods across different categories. + +#### Theory and methods + +Cole Nussbaumer Knaflic, 2015, _Storytelling with Data_, pp.47-49. + +#### R + +Kyle Walker, 2015-05-17, [Global population change with a slopegraph in ggplot2](https://rpubs.com/walkerke/slopegraph) + + +#### `slopegraph` + +[github](https://github.com/leeper/slopegraph) + +--- ### Ternary plots [`ggtern` - an extension to `ggplot2`](http://www.ggtern.com/) for plotting ternary diagrams. +--- ### Waffle plots diff --git a/60_TextAnalysis.rmd b/60_text_analysis.rmd similarity index 100% rename from 60_TextAnalysis.rmd rename to 60_text_analysis.rmd