Skip to content

Commit

Permalink
Always say goodbye
Browse files Browse the repository at this point in the history
  • Loading branch information
MonkmanMH committed Feb 11, 2019
1 parent 04899a6 commit 7c3b3ef
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 1 deletion.
4 changes: 4 additions & 0 deletions 03_data_science_practice.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,10 @@ Reproducible Science Workshop, 2015
* Karl Broman and Kara Woo, ["Data organization in spreadsheets"](http://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989), _The American Statistician_, 2017-09-29.


<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">For data scientists thinking about biases in your data, don&#39;t start by reading the computer science literature. Read epidemiology instead. You need data street smarts, not mathy book smarts. Otherwise the first data set you meet is going to beat you up and take your lunch money!</p>&mdash; Kareem ❤️ statistics (@kareem_carr) <a href="https://twitter.com/kareem_carr/status/1094993530097991680?ref_src=twsrc%5Etfw">February 11, 2019</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>


### Versioned data

* Daniel Falster, Richard G FitzJohn, Matthew W. Pennell, William K. Cornwell (2017-11-10) [Versioned data: why it is needed and how it can be achieved (easily and cheaply)](https://peerj.com/preprints/3401/)
Expand Down
61 changes: 61 additions & 0 deletions 04_data_wrangling.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Data Wrangling (emphasis on `dplyr`)


```{r echo = FALSE}
library(knitr)
opts_chunk$set(message = FALSE, warning = FALSE, cache = TRUE)
options(width = 100, dplyr.width = 100)
library(ggplot2)
theme_set(theme_light())
```



## Introduction

Data is rarely in condition to use it...there's invariably something amiss. Data wrangling (a.k.a. data carpentry) is the process of getting it ready for analysis.


## Theory and methods


[Stat 545: Data wrangling, exploration, and analysis with R](http://stat545.com/index.html) -- course materials associated with the University of British Columbia's Statistics 545 course. Prepared in large part by Dr. Jenny Bryan.


### Tidy evaluation

* programming with `dplyr`

Edwin Thoen, 2017-08-25 [Tidy evaluation, most common actions](https://edwinth.github.io/blog/dplyr-recipes/)

### Reading messy files

Luis D. Verde, 2018-12-14, [Tidyeval meets PDF table hell](http://luisdva.github.io/rstats/Tidyeval-pdf-hell/) -- great solution to the common problem of broken rows ("values that are broken up into two lines for whatever reason (often to optimize space on a page in a table in a typeset pdf)").


### Working with dates

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">Updated Turing Test concept:<br>A spreadsheet of dates, hand-entered by interns more than a decade ago, featuring such well-known time formats as &quot;1996ish&quot;, &quot;1941/xd01944&quot;, &quot;1955?&quot; and &quot;WWII.&quot;<br>I&#39;m not worried about AI until someone shows me the algorithm that can make sense of this. <a href="https://t.co/IhzofigX2b">pic.twitter.com/IhzofigX2b</a></p>&mdash; Brooke Watson (@brookLYNevery1) <a href="https://twitter.com/brookLYNevery1/status/954368989181902848?ref_src=twsrc%5Etfw">January 19, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>


## R

Arranged by package

### `dplyr`

**package**

CRAN: [dplyr: A Grammar of Data Manipulation](https://CRAN.R-project.org/package=dplyr)

github: [hadley/dplyr](https://github.com/hadley/dplyr)

**articles**

* [Introduction to dplyr](http://stat545.com/block009_dplyr-intro.html), part of the UBC [STAT545: Data wrangling, exploration, and analysis with R](http://stat545.com/index.html) course materials


* Gary Hutson, 2018-05-24, [DPLYR: A Beginners Guide](https://www.r-bloggers.com/dplyr-a-beginners-guide/)

-30-
6 changes: 5 additions & 1 deletion 40_data_visualization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,12 @@ Design Space of Data Visualization"](https://www.sciencedirect.com/science/artic
- ["How the BBC Visual and Data Journalism team works with graphics in R"](https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535)


#### extensions

** ggplot2 tips and tricks **
Gallery of `ggplot2` extensions: [ggplot2-exts.org/gallery/](ggplot2 extensions - gallery )


#### tips and tricks

* Simon Jackson, 2016-08-11, [Plotting background data for groups with ggplot2](https://drsimonj.svbtle.com/plotting-background-data-for-groups-with-ggplot2)

Expand Down

0 comments on commit 7c3b3ef

Please sign in to comment.