ModernDive 0.5.0

Highlights

"Data wrangling" chapter now comes after "Tidy data" chapter.
Improved explanations and examples of geom_histogram(), geom_boxplot(), and "tidy" data
Moving residual analysis from regression Chapters 6 & 7 to Chap 11: Inference for regression
Reorganized Chap 8 on Sampling
All learning check solutions now in Appendix D
PDF build re-added (still a work-in-progress)

Changed title
- From: "Statistical Inference via Data Science in R"
- To: "Statistical Inference via Data Science: A moderndive into R and the tidyverse"
Chapter 2 - Getting Started
- Added subsection 2.2.3 "Errors, warnings, and messages" by @andrewheiss
Chapter 3 - Data visualization:
- Added simpler introductory geom_histogram() and geom_boxplot() examples
- Started downweighting the amount of data wrangling previews included in this chapter, in particular join.
- Cleaned up conclusion section
- Added cheatsheet
Switched order of "Chap 4 Tidy Data" and "Chap 5 Data Wrangling": Data Wrangling now comes first
Chapter 4 - Data wrangling:
- Added cheatsheet
Chapter 5 - Renamed to "Importing and tidy data"
- Reordered sections: importing then tidying
- Added fivethirtyeight::drinks example of "hitting the non-tidy wall", then using tidyr::gather()
- Made Guatemala democracy score a case study.
- Added discussion on what tidyverse package is.
- Moved discussion on normal forms to Ch4: Data Wrangling - joins.
- Moved discussion on identification vs measurement variables to Ch2: Getting started with data.
Chapter 6 - Basic regression:
- Moved residual analysis to Chapter 11
Chapter 7 - Multiple regression:
- Moved residual analysis to Chapter 11
Chapter 8 - Sampling: Major refactoring of presentation/exposition; see below
Chapter 11 - Inference for regression:
- Moved residual analysis from Chapter 6 & 7 here
Moved all Learning Check solutions to Appendix D

Old chapter structure:

Introduction to sampling
a) Concepts related to sampling
b) Inference via sampling
Tactile sampling simulation
a) Using the shovel once
b) Using the shovel 33 times
Virtual sampling simulation
a) Using the shovel once
b) Using shovel 33 times
c) Using shovel 1000 times
d) Using different shovels
In real-life sampling: Polls
Conclusion
a) Central Limit Theorem
b) What’s to come?
c) Script of R code

New chapter structure:

Activity: Sampling from a bowl
a) Question: What proportion of this bowl is red?
b) Using shovel once
c) Using shovel 33 times
Computer simulation:
a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer
b) Using shovel once
c) Using shovel 33 times
d) Using shovel 1000 times
e) Using different shovels
Goal: Study fluctuations due to sampling variation
a) You probably already knew: Bigger sample size means "better" guess.
b) Comparing shovels: Role of sample size
Framework: Sampling
a) Terminology for sampling (population, sample, point estimate, etc)
b) Statistical concepts: sampling distribution and standard error
c) Computer's random number generator
Interpretation:
a) Visual display of differences
Case study: Obama poll
Big picture:
a) Table of inferential scenarios: Add bowl and obama poll (both p)
b) Why does this work? Theoretial result: CLT
c) There's a formula for that: SE formula that has sqrt(n) at the bottom
d) Appendix: Normal distribution discuss