ModernDive 0.5.0: Closing in on v1.0.0
ModernDive 0.5.0
Highlights
- "Data wrangling" chapter now comes after "Tidy data" chapter.
- Improved explanations and examples of
geom_histogram()
,geom_boxplot()
, and "tidy" data - Moving residual analysis from regression Chapters 6 & 7 to Chap 11: Inference for regression
- Reorganized Chap 8 on Sampling
- All learning check solutions now in Appendix D
- PDF build re-added (still a work-in-progress)
All content changes
- Changed title
- From: "Statistical Inference via Data Science in R"
- To: "Statistical Inference via Data Science: A moderndive into R and the tidyverse"
- Chapter 2 - Getting Started
- Added subsection 2.2.3 "Errors, warnings, and messages" by @andrewheiss
- Chapter 3 - Data visualization:
- Added simpler introductory
geom_histogram()
andgeom_boxplot()
examples - Started downweighting the amount of data wrangling previews included in this chapter, in particular
join
. - Cleaned up conclusion section
- Added cheatsheet
- Added simpler introductory
- Switched order of "Chap 4 Tidy Data" and "Chap 5 Data Wrangling": Data Wrangling now comes first
- Chapter 4 - Data wrangling:
- Added cheatsheet
- Chapter 5 - Renamed to "Importing and tidy data"
- Reordered sections: importing then tidying
- Added
fivethirtyeight::drinks
example of "hitting the non-tidy wall", then usingtidyr::gather()
- Made Guatemala democracy score a case study.
- Added discussion on what
tidyverse
package is. - Moved discussion on normal forms to Ch4: Data Wrangling - joins.
- Moved discussion on identification vs measurement variables to Ch2: Getting started with data.
- Chapter 6 - Basic regression:
- Moved residual analysis to Chapter 11
- Chapter 7 - Multiple regression:
- Moved residual analysis to Chapter 11
- Chapter 8 - Sampling: Major refactoring of presentation/exposition; see below
- Chapter 11 - Inference for regression:
- Moved residual analysis from Chapter 6 & 7 here
- Moved all Learning Check solutions to Appendix D
Chapter 8 Sampling Refactoring
Old chapter structure:
- Introduction to sampling
a) Concepts related to sampling
b) Inference via sampling - Tactile sampling simulation
a) Using the shovel once
b) Using the shovel 33 times - Virtual sampling simulation
a) Using the shovel once
b) Using shovel 33 times
c) Using shovel 1000 times
d) Using different shovels - In real-life sampling: Polls
- Conclusion
a) Central Limit Theorem
b) What’s to come?
c) Script of R code
New chapter structure:
- Activity: Sampling from a bowl
a) Question: What proportion of this bowl is red?
b) Using shovel once
c) Using shovel 33 times - Computer simulation:
a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer
b) Using shovel once
c) Using shovel 33 times
d) Using shovel 1000 times
e) Using different shovels - Goal: Study fluctuations due to sampling variation
a) You probably already knew: Bigger sample size means "better" guess.
b) Comparing shovels: Role of sample size - Framework: Sampling
a) Terminology for sampling (population, sample, point estimate, etc)
b) Statistical concepts: sampling distribution and standard error
c) Computer's random number generator - Interpretation:
a) Visual display of differences - Case study: Obama poll
- Big picture:
a) Table of inferential scenarios: Add bowl and obama poll (both p)
b) Why does this work? Theoretial result: CLT
c) There's a formula for that: SE formula that has sqrt(n) at the bottom
d) Appendix: Normal distribution discuss