restructure tidyverse and use of R materials

MonkmanMH · Mar 23, 2019 · b56f591 · b56f591
1 parent 294847c
commit b56f591
Show file tree

Hide file tree

Showing 3 changed files with 135 additions and 101 deletions.
diff --git a/03_data_science_practice.Rmd b/03_data_science_practice.Rmd
@@ -304,107 +304,7 @@ Louisa Smith , [epi quals study calendar](https://docs.google.com/spreadsheets/d
 * https://twitter.com/louisahsmith/status/1081955868864901120
 
 
-***
-
-## The tidyverse
-
-All too often, data are messy. There are rows with no contents, colour-coded cells, and inconsistent values.
-
-One important way that data can be cleaned is to ensure that the structure is tidy. What do we mean by tidy data?
-
-> There are three interrelated rules which make a dataset tidy:
-> * Each variable must have its own column.
-> * Each observation must have its own row.
-> * Each value must have its own cell.
-
-And 
-
-> Why ensure that your data is tidy? There are two main advantages:
-> 
-> 1. There’s a general advantage to picking one consistent way of storing data. If you have a consistent data structure, it’s easier to learn the tools that work with it because they have an underlying uniformity.
->
-> 2. There’s a specific advantage to placing variables in columns because it allows R’s vectorised nature to shine. As you learned in mutate and summary functions, most built-in R functions work with vectors of values. That makes transforming tidy data feel particularly natural.
-
-(from Hadley Wickham & Garrett Grolemund, [_R for Data Science_](http://r4ds.had.co.nz/))
-
-This won't solve things like inconsistent values and colour-coded cells, but it will solve some other messiness.
-
-For more about the principles of tidy data, see:
-
-* Hadley Wickham, ["Tidy data", _The Journal of Statistical Software_, vol. 59, 2014.](https://www.jstatsoft.org/article/view/v059i10)
-  + [alternate link:](http://vita.had.co.nz/papers/tidy-data.html)
-  + [informal and code-heavy version](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html)
-
-
-### Other references
-
-Karl Broman and Kara Woo, ["Data organization in spreadsheets"](https://github.com/kbroman/Paper_DataOrg) (github page with source manuscript) -- application of tidy principles to spreadsheets.
-
-* see also Karl Broman's tutorial, ["Data organization: organizing data in spreadsheets)
-
-
-Bruno Rodriguez, [Modern R with the tidyverse](https://b-rodrigues.github.io/modern_R/)
-
-
-Jesse Sadler, [Excel vs R: A Brief Introduction to R  (With examples using dplyr and ggplot](http://kbroman.org/dataorg/](https://www.jessesadler.com/post/excel-vs-r/) (2017-10-02)
-
-
-
-
-
-## Tidy Tools
-
-[The tidyverse style guide](http://style.tidyverse.org/functions.html)
-
-
-### tidyverse R packages
-
-[The tidyverse: ](http://tidyverse.org/)
-
-[The tidyverse R packages on github](https://github.com/hadley/tidyverse)
-
-#### `broom`
-
-
-#### `dplyr`
-
-`dplyr` now gets its own page, labelled [**Data Wrangling**](DataWrangling.md)
-
-
-#### `purrr`
-
-* [A purrr tutorial](https://github.com/Cascadia-R/purrr-tutorial) -- Cascadia-R, 2017-06-03
-
-* Charlotte Wickham, [purr tutorial](https://github.com/cwickham/purrr-tutorial) -- github
-
-### more about tidy data
-
-* Hadley Wickham & Garrett Grolemund, [_R for Data Science_](http://r4ds.had.co.nz/)
-
-* Hadley Wickham
-  + [Tidy data and tidy tools (video of presentation, December 2011)](https://vimeo.com/33727555)
-
-* Garrett Grolemund
-  + [Data Tidying](http://garrettgman.github.io/tidying/) (part of [Data Science with R](http://garrettgman.github.io/))
-
-* Chester Ismay and Ted Laderas, [A gRadual-intRoduction to the tidyverse](https://github.com/Cascadia-R/gRadual-intRoduction-tidyverse?utm_content=buffer98896&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer)  
-
-
-***
-
-## Categorical data
-
-Emily Robinson, DataCamp course, [Categorical Data in the Tidyverse](https://www.datacamp.com/courses/categorical-data-in-the-tidyverse)
-
-***
-
-## Tidy Text
-
-If  you're going to undertake text mining and natural language processing, your text (i.e. your data) needs to be tidy.  Fortunately, there's an R package for that: `tidytext`.
-
-* Julia Silge, [Term Frequency and tf-idf Using Tidy Data Principles](http://juliasilge.com/blog/Term-Frequency-tf-idf/), 2016-06-27
 
-(See the companion page on the topics of [text analysis and text mining](https://github.com/MonkmanMH/DataScienceResources/blob/master/TextAnalysis.md)).
 
 
 

diff --git a/04_data_science_tools.Rmd b/04_data_science_tools.Rmd
@@ -0,0 +1,113 @@
+# Using R {#using_r}
+
+
+
+## Introduction
+
+In addition to there being many resources available for using R to solve statistical and data science challenges, there are also many resources on how to maximize your effectiveness using R.
+
+This chapter compiles what I consider to be the essential texts; articles and blog posts will be at a minimum.
+
+
+## R as a programming environment
+
+Colin Gillespie and Robin Lovelace, _Efficient R Programming_ [@Gillespie_Lovelace_2017]
+
+Hadley Wickham, _Advanced R_ [@Wickham_advancedR]
+
+***
+
+## The tidyverse
+
+All too often, data are messy. There are rows with no contents, colour-coded cells, and inconsistent values.
+
+One important way that data can be cleaned is to ensure that the structure is tidy. What do we mean by tidy data?
+
+> There are three interrelated rules which make a dataset tidy:
+> * Each variable must have its own column.
+> * Each observation must have its own row.
+> * Each value must have its own cell.
+
+And 
+
+> Why ensure that your data is tidy? There are two main advantages:
+> 
+> 1. There’s a general advantage to picking one consistent way of storing data. If you have a consistent data structure, it’s easier to learn the tools that work with it because they have an underlying uniformity.
+>
+> 2. There’s a specific advantage to placing variables in columns because it allows R’s vectorised nature to shine. As you learned in mutate and summary functions, most built-in R functions work with vectors of values. That makes transforming tidy data feel particularly natural.
+
+(from [@Wickham_Grolemund2016])
+
+
+This won't solve things like inconsistent values and colour-coded cells, but it will solve some other messiness.
+
+For more about the principles of tidy data, see Hadley Wickham's article "Tidy data", in _The Journal of Statistical Software_ [@tidydata]
+
+  + [alternate link:](http://vita.had.co.nz/papers/tidy-data.html)
+  + [informal and code-heavy version](https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html)
+
+
+### Other tidyverse references
+
+Karl Broman and Kara Woo, ["Data organization in spreadsheets"](https://github.com/kbroman/Paper_DataOrg) (github page with source manuscript) -- application of tidy principles to spreadsheets.
+
+* see also Karl Broman's tutorial, ["Data organization: organizing data in spreadsheets)
+
+
+Bruno Rodriguez, [Modern R with the tidyverse](https://b-rodrigues.github.io/modern_R/)
+
+
+Jesse Sadler, [Excel vs R: A Brief Introduction to R  (With examples using dplyr and ggplot](http://kbroman.org/dataorg/](https://www.jessesadler.com/post/excel-vs-r/) (2017-10-02)
+
+
+#### Categorical data
+
+Emily Robinson, DataCamp course, [Categorical Data in the Tidyverse](https://www.datacamp.com/courses/categorical-data-in-the-tidyverse)
+
+
+#### Tidy text
+
+If  you're going to undertake text mining and natural language processing, your text (i.e. your data) needs to be tidy.  Fortunately, there's an R package for that: `tidytext`.
+
+See the companion chapter on the topics of [Text Analysis and Text Mining].
+
+
+
+
+### tidyverse R packages
+
+[The tidyverse: ](http://tidyverse.org/)
+
+[The tidyverse R packages on github](https://github.com/hadley/tidyverse)
+
+#### `broom`
+
+
+#### `dplyr`
+
+`dplyr` now gets its own page, labelled [**Data Wrangling**](DataWrangling.md)
+
+
+#### `purrr`
+
+* [A purrr tutorial](https://github.com/Cascadia-R/purrr-tutorial) -- Cascadia-R, 2017-06-03
+
+* Charlotte Wickham, [purr tutorial](https://github.com/cwickham/purrr-tutorial) -- github
+
+### more about tidy data
+
+* Hadley Wickham & Garrett Grolemund, [_R for Data Science_](http://r4ds.had.co.nz/)
+
+* Hadley Wickham
+  + [Tidy data and tidy tools (video of presentation, December 2011)](https://vimeo.com/33727555)
+
+* Garrett Grolemund
+  + [Data Tidying](http://garrettgman.github.io/tidying/) (part of [Data Science with R](http://garrettgman.github.io/))
+
+* Chester Ismay and Ted Laderas, [A gRadual-intRoduction to the tidyverse](https://github.com/Cascadia-R/gRadual-intRoduction-tidyverse?utm_content=buffer98896&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer)  
+
+
+
+
+
+-30-
diff --git a/book.bib b/book.bib
@@ -86,6 +86,15 @@ @Book{Gelman_etal_2014
 }
 
 
+@Book{Gillespie_Lovelace_2017,
+  title = {Efficient R Programming: A Practical Guide to Smarter Programming},
+  author = {Colin Gillespie and Robin Lovelace},
+  publisher = {O'Reilly},
+  year = {2017},
+  isbn = {ISBN 978-1-491-95078-4},
+  url = {https://csgillespie.github.io/efficientR/}
+}
+
 
 @Book{Graham_etal_2015,
   title = {Exploring Big Historical Data: The Historian's Macroscope},
@@ -266,8 +275,20 @@ @article{tidydata
    issn = {1548-7660},
    pages = {1--23},
    doi = {10.18637/jss.v059.i10},
-   url = {https://www.jstatsoft.org/index.php/jss/article/view/v059i10}
+   url = {https://www.jstatsoft.org/index.php/jss/article/view/v059i10},
    }
+
+
+@Book{Wickham_advancedR,
+  title = {Advanced R},
+  author = {Hadley Wickham},
+  publisher = {CRC Press},
+  year = {2015},
+  isbn = {ISBN 978-1-4665-8696-3},
+  url = {https://adv-r.hadley.nz/},
+}
+
+
    
 @Book{Wickham_Grolemund2016,
     author = {Hadley Wickham and