diff --git a/03_data_science_practice.Rmd b/03_data_science_practice.Rmd index 871035e..79c709b 100644 --- a/03_data_science_practice.Rmd +++ b/03_data_science_practice.Rmd @@ -219,6 +219,10 @@ Reproducible Science Workshop, 2015 * Karl Broman and Kara Woo, ["Data organization in spreadsheets"](http://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989), _The American Statistician_, 2017-09-29. +
+ + + ### Versioned data * Daniel Falster, Richard G FitzJohn, Matthew W. Pennell, William K. Cornwell (2017-11-10) [Versioned data: why it is needed and how it can be achieved (easily and cheaply)](https://peerj.com/preprints/3401/) diff --git a/04_data_wrangling.Rmd b/04_data_wrangling.Rmd new file mode 100644 index 0000000..70fb249 --- /dev/null +++ b/04_data_wrangling.Rmd @@ -0,0 +1,61 @@ +# Data Wrangling (emphasis on `dplyr`) + + +```{r echo = FALSE} +library(knitr) +opts_chunk$set(message = FALSE, warning = FALSE, cache = TRUE) +options(width = 100, dplyr.width = 100) +library(ggplot2) +theme_set(theme_light()) +``` + + + +## Introduction + +Data is rarely in condition to use it...there's invariably something amiss. Data wrangling (a.k.a. data carpentry) is the process of getting it ready for analysis. + + +## Theory and methods + + +[Stat 545: Data wrangling, exploration, and analysis with R](http://stat545.com/index.html) -- course materials associated with the University of British Columbia's Statistics 545 course. Prepared in large part by Dr. Jenny Bryan. + + +### Tidy evaluation + +* programming with `dplyr` + +Edwin Thoen, 2017-08-25 [Tidy evaluation, most common actions](https://edwinth.github.io/blog/dplyr-recipes/) + +### Reading messy files + +Luis D. Verde, 2018-12-14, [Tidyeval meets PDF table hell](http://luisdva.github.io/rstats/Tidyeval-pdf-hell/) -- great solution to the common problem of broken rows ("values that are broken up into two lines for whatever reason (often to optimize space on a page in a table in a typeset pdf)"). + + +### Working with dates + +For data scientists thinking about biases in your data, don't start by reading the computer science literature. Read epidemiology instead. You need data street smarts, not mathy book smarts. Otherwise the first data set you meet is going to beat you up and take your lunch money!
— Kareem ❤️ statistics (@kareem_carr) February 11, 2019
+ + + +## R + +Arranged by package + +### `dplyr` + +**package** + +CRAN: [dplyr: A Grammar of Data Manipulation](https://CRAN.R-project.org/package=dplyr) + +github: [hadley/dplyr](https://github.com/hadley/dplyr) + +**articles** + +* [Introduction to dplyr](http://stat545.com/block009_dplyr-intro.html), part of the UBC [STAT545: Data wrangling, exploration, and analysis with R](http://stat545.com/index.html) course materials + + +* Gary Hutson, 2018-05-24, [DPLYR: A Beginners Guide](https://www.r-bloggers.com/dplyr-a-beginners-guide/) + +-30- diff --git a/40_data_visualization.Rmd b/40_data_visualization.Rmd index 0478ed5..09ce5c4 100644 --- a/40_data_visualization.Rmd +++ b/40_data_visualization.Rmd @@ -95,8 +95,12 @@ Design Space of Data Visualization"](https://www.sciencedirect.com/science/artic - ["How the BBC Visual and Data Journalism team works with graphics in R"](https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535) +#### extensions -** ggplot2 tips and tricks ** +Gallery of `ggplot2` extensions: [ggplot2-exts.org/gallery/](ggplot2 extensions - gallery ) + + +#### tips and tricks * Simon Jackson, 2016-08-11, [Plotting background data for groups with ggplot2](https://drsimonj.svbtle.com/plotting-background-data-for-groups-with-ggplot2)Updated Turing Test concept:
— Brooke Watson (@brookLYNevery1) January 19, 2018
A spreadsheet of dates, hand-entered by interns more than a decade ago, featuring such well-known time formats as "1996ish", "1941/xd01944", "1955?" and "WWII."
I'm not worried about AI until someone shows me the algorithm that can make sense of this. pic.twitter.com/IhzofigX2b