-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
restructure tidyverse and use of R materials
- Loading branch information
Showing
3 changed files
with
135 additions
and
101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Using R {#using_r} | ||
|
||
|
||
|
||
## Introduction | ||
|
||
In addition to there being many resources available for using R to solve statistical and data science challenges, there are also many resources on how to maximize your effectiveness using R. | ||
|
||
This chapter compiles what I consider to be the essential texts; articles and blog posts will be at a minimum. | ||
|
||
|
||
## R as a programming environment | ||
|
||
Colin Gillespie and Robin Lovelace, _Efficient R Programming_ [@Gillespie_Lovelace_2017] | ||
|
||
Hadley Wickham, _Advanced R_ [@Wickham_advancedR] | ||
|
||
*** | ||
|
||
## The tidyverse | ||
|
||
All too often, data are messy. There are rows with no contents, colour-coded cells, and inconsistent values. | ||
|
||
One important way that data can be cleaned is to ensure that the structure is tidy. What do we mean by tidy data? | ||
|
||
> There are three interrelated rules which make a dataset tidy: | ||
> * Each variable must have its own column. | ||
> * Each observation must have its own row. | ||
> * Each value must have its own cell. | ||
And | ||
|
||
> Why ensure that your data is tidy? There are two main advantages: | ||
> | ||
> 1. There’s a general advantage to picking one consistent way of storing data. If you have a consistent data structure, it’s easier to learn the tools that work with it because they have an underlying uniformity. | ||
> | ||
> 2. There’s a specific advantage to placing variables in columns because it allows R’s vectorised nature to shine. As you learned in mutate and summary functions, most built-in R functions work with vectors of values. That makes transforming tidy data feel particularly natural. | ||
(from [@Wickham_Grolemund2016]) | ||
|
||
|
||
This won't solve things like inconsistent values and colour-coded cells, but it will solve some other messiness. | ||
|
||
For more about the principles of tidy data, see Hadley Wickham's article "Tidy data", in _The Journal of Statistical Software_ [@tidydata] | ||
|
||
+ [alternate link:](http://vita.had.co.nz/papers/tidy-data.html) | ||
+ [informal and code-heavy version](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html) | ||
|
||
|
||
### Other tidyverse references | ||
|
||
Karl Broman and Kara Woo, ["Data organization in spreadsheets"](https://github.com/kbroman/Paper_DataOrg) (github page with source manuscript) -- application of tidy principles to spreadsheets. | ||
|
||
* see also Karl Broman's tutorial, ["Data organization: organizing data in spreadsheets) | ||
|
||
|
||
Bruno Rodriguez, [Modern R with the tidyverse](https://b-rodrigues.github.io/modern_R/) | ||
|
||
|
||
Jesse Sadler, [Excel vs R: A Brief Introduction to R (With examples using dplyr and ggplot](http://kbroman.org/dataorg/](https://www.jessesadler.com/post/excel-vs-r/) (2017-10-02) | ||
|
||
|
||
#### Categorical data | ||
|
||
Emily Robinson, DataCamp course, [Categorical Data in the Tidyverse](https://www.datacamp.com/courses/categorical-data-in-the-tidyverse) | ||
|
||
|
||
#### Tidy text | ||
|
||
If you're going to undertake text mining and natural language processing, your text (i.e. your data) needs to be tidy. Fortunately, there's an R package for that: `tidytext`. | ||
|
||
See the companion chapter on the topics of [Text Analysis and Text Mining]. | ||
|
||
|
||
|
||
|
||
### tidyverse R packages | ||
|
||
[The tidyverse: ](http://tidyverse.org/) | ||
|
||
[The tidyverse R packages on github](https://github.com/hadley/tidyverse) | ||
|
||
#### `broom` | ||
|
||
|
||
#### `dplyr` | ||
|
||
`dplyr` now gets its own page, labelled [**Data Wrangling**](DataWrangling.md) | ||
|
||
|
||
#### `purrr` | ||
|
||
* [A purrr tutorial](https://github.com/Cascadia-R/purrr-tutorial) -- Cascadia-R, 2017-06-03 | ||
|
||
* Charlotte Wickham, [purr tutorial](https://github.com/cwickham/purrr-tutorial) -- github | ||
|
||
### more about tidy data | ||
|
||
* Hadley Wickham & Garrett Grolemund, [_R for Data Science_](http://r4ds.had.co.nz/) | ||
|
||
* Hadley Wickham | ||
+ [Tidy data and tidy tools (video of presentation, December 2011)](https://vimeo.com/33727555) | ||
|
||
* Garrett Grolemund | ||
+ [Data Tidying](http://garrettgman.github.io/tidying/) (part of [Data Science with R](http://garrettgman.github.io/)) | ||
|
||
* Chester Ismay and Ted Laderas, [A gRadual-intRoduction to the tidyverse](https://github.com/Cascadia-R/gRadual-intRoduction-tidyverse?utm_content=buffer98896&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer) | ||
|
||
|
||
|
||
|
||
|
||
-30- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters