Skip to content

Commit

Permalink
Merge pull request #181 from moderndive/dev-to-release-v0.6.0
Browse files Browse the repository at this point in the history
Dev to release v0.6.0
  • Loading branch information
rudeboybert authored Aug 7, 2019
2 parents 0cbc34a + e25ebe0 commit e3a744a
Show file tree
Hide file tree
Showing 411 changed files with 47,724 additions and 180 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ ModernDive\ Master.Rproj
*_bookdown_files
ismaykim.log
ismaykim.Rmd
ismaykim.tex
moderndive.log
moderndive.Rmd
moderndive.tex
bib/packages.bib
purl.Rout
docs/*
Expand Down
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ pandoc_version: 1.19.2.1
r_packages:
- devtools

r_github_packages:
- moderndive/moderndive
- tidymodels/infer
#r_github_packages:
# - moderndive/moderndive
# - tidymodels/infer

before_script:
- chmod +x ./_build.sh
Expand Down
177 changes: 22 additions & 155 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,161 +1,28 @@
# ModernDive 0.5.0.9000

## Major refactoring of inference chapters of book

**Old Chapter Structure**:

* Chapter 9 - Confidence Intervals
1. Bootstrapping
a) Data explanation
b) Exploratory data analysis
c) The Bootstrapping Process
2. The infer package for statistical inference
a) Specify variables
b) Generate replicates
c) Calculate summary statistics
d) Visualize the results
3. Now to confidence intervals
a) The percentile method
b) The standard error method
4. Comparing bootstrap and sampling distributions
5. Interpreting the confidence interval
6. Example: One proportion
a) Observed Statistic
b) Bootstrap distribution
c) Theory-based confidence intervals
7. Example: Comparing two proportions
a) Compute the point estimate
b) Bootstrap distribution
8. Conclusion
a) What’s to come?
b) Script of R code
* Chapter 10 - Hypothesis Testing
1. When inference is not needed
2. Basics of hypothesis testing
3. Criminal trial analogy
a) Two possible conclusions
4. Types of errors in hypothesis testing
a) Logic of hypothesis testing
5. Statistical significance
6. Hypothesis testing with infer
7. Example: Comparing two means
a) Randomization/permutation
b) Comparing action and romance movies
c) Sampling -> randomization
d) Data
e) Model of H0
f) Test statistic delta
g) Observed effect delta*
h) Simulated data
i) Distribution of delta under H0
j) The p-value
k) Corresponding confidence interval
l) Summary
8. Building theory-based methods using computation
a) Example: t-test for two independent samples
b) Conditions for t-test
9. Conclusion
a) Script of R code
* Chapter 11 - Inference for Regression
1. Simulation-based Inference for Regression
1. Bootstrapping for the regression slope
1. Inference for multiple regression
a) Refresher: Professor evaluations data
b) Refresher: Visualizations
c) Refresher: Regression tables
d) Script of R code


**New Chapter Structure**:

* Chapter 9 - Confidence Intervals
1. Activity: Working with a sample of pennies from the bank. Are they representative of all pennies in the US?
- a) Question: What do I do when I only have one sample?
- b) Resampling once (paper slips)
- c) Resampling 33 times
- d) Diagrams in Keynote
2. Computer simulation:
- a) What is resampling?
- b) Resampling once
- c) Resampling 33 times
- d) Resampling 1000 times
3. Goal: Generate an estimate that accounts for sampling variation
- a) Constructing a confidence interval: hide code to shade ci region and to get the actual values.
- b) Constructing a CI using percentile method
- c) Constructing a CI using SE method
4. Framework: Boostrap resampling with replacement
- a) What dplyr verbs did we use?
- b) There is only one test framework
- c) the infer package: make sure to draw parallels between dplyr code and infer verbs
5. Interpretation:
- a) 95% speaks to reliability of the process, not about an particular interval. "We are 95% confident"
- b) What determines the width? Sample size, confidence levels (only int at population variance)
6. Case study: Comparing two proportions with Mythbusters data
7. Big picture:
- a) Does this even work? Comparing sampling and bootstrap distribution. Do this using balls.
- b) Table of inferential scenarios: Add pennies (mu) and Mythbusters (p1 - p2)
- c) Why does this work? Theoretical result: Efron. The empirical CDF converges to the population CDF. Bootstrap works for any point estimate
- d) There's a formula for that! Margin of error using critical values z. Talk about normal distributions.
* Chapter 10 - Hypothesis Testing
1. Activity: Shuffling resumes between male and female job applicants
- a) Question: Are men and women rated for jobs differently?
- b) Alternate universe: No difference
- c) What about sampling variation?
- d) What did we actually observe?
- e) How likely is this result?
- f) Diagrams in Keynote
2. Extension of previous framework/infer
- a) Revisit verb framework
- b) Permutation test resampling w/o replacement
- c) There is only one test framework
- d) Do activity via infer package
3. Goal: Choose between two possible truths while accounting for sampling variation
- a) Conducting a hypothesis test
- b) Null hypothesis that's assumed
- c) Null distribution of test statistics: A "alternate universe" distribution
- d) Observed test statistics
- e) Definition of p-value
4. Interpretation:
- a) Analogy of criminal justice system
- b) Types of errors: 2x2 table
- c) A yes/no-type decision: statistical significance via alpha
5. Case study: Comparing two means with action vs romance movie data
- Use the "There is Only One Test" framework here
6. Conclusion
- a) When is inference not needed: EDA can solve the problem.
- b) Problems with p-values: p-hacking, hard to understand, ASA statement
- c) Comparison with confidence intervals. HT yields binary decision, but CI's yield plausible range of estimates. This is statistical vs practical significance
- d) Table of inferential scenarios: Add action vs romance (mu1 - mu2)
- e) Why does this work? Theoretical result: Neyman-Pearson lemma (maybe)
- f) There's a formula for that! t-test. Draw a null distribution with t-distribution superimposed.
* Chapter 11 - Inference for Regression
1. Activity: Revisit simple linear regression
a) Question: Is there a significant relationship between teaching score and bty score above and beyond any evidence due to sampling variation.
b) Review exercise/re-run all code
c) Regression table
1. Computer simulation:
a) Permuting the relationship: to do a hypothesis test assuming independence of y & x.
a) Bootstraping the rows: Having done HT, generate confidence interval.
1. Goal: Inferring about the population regression slope
1. Framework:
1. Interpretation:
a) "You don't have to do any of this! Values in table are given!" No simulations necessary!
b) Conditions for inference: residual and partial residual plots, assumption of indepdence.
1. Case study: Multiple regression example from Ch 7.
1. Big picture:
a) ANOVA = Regression with categorical variables
b) Table of inferential scenarios: Add (beta1)
c) Why does this work?
d) There's a formula for that! Fitted intercept and slope. SE of fitted intercept and slope: observe there is a sqrt(n) in denominator.

# ModernDive 0.6.0

## Done first pass of infer chapters

## All content changes
Completed major re-organization and clean-up of Chapters 9-11 using the `infer` package for "tidy and transparent" statistcal inference.

* Chapter 9: Bootstrapping & confidence intervals
+ Tactile exercise of sampling 50 pennies from bank and resampling from this sample.
+ Added sections on
1. "Interpreting confidence intervals", in particular determinants of CI width.
1. "Theory-based confidence intervals" using formula for SE of p-hat, thereby bridging gap between simulation and theory-based methods.
* Chapter 10: Hypothesis testing
+ Added `promotions` example on gender discrimination in promotions at a bank. Data source: `openintro::gender.discrimination`
+ Added section on "Theory-based hypothesis tests" using t-test, thereby bridging gap between simulation and theory-based methods.
* Chapter 11: Inference for regression.
+ Discussion on LINE conditions for inference. In particular using `moderndive::get_regression_points()` wrapper function to `broom::augment()` so that novices can do their own residual analyses.

## Other changes

* Chapter 7: Multiple regression
+ Added Section 7.3.1 on model selection: choosing between "interaction" and "parallel slopes" models
* Chapter 8: Sampling
+ Added Section 8.5.3 with more in-depth discussion of normal distribution
* Chapter 12: Renamed to "Tell your story with data"

* Chapter 6 - Basic regression:
+ Changed `skimr::skim()` outputs to be of type console.
+ Shortened simple linear regression EDA, in particular `geom_jitter()` and `geom_smooth(se = FALSE)`
+ Expanded on "least squares" criteria for "best" fitting line in 6.3.3


***
Expand Down
36 changes: 14 additions & 22 deletions index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Statistical Inference via Data Science"
subtitle: "A moderndive into R and the tidyverse"
author: "Chester Ismay and Albert Y. Kim"
date: "`r format(Sys.time(), '%B %d, %Y')`"
date: "August 7, 2019"
site: bookdown::bookdown_site
documentclass: krantz
bibliography: [bib/books.bib, bib/packages.bib, bib/articles.bib]
Expand All @@ -28,16 +28,14 @@ favicon: "images/logos/favicons/favicon.ico"
\mainmatter

```{r set-options, include=FALSE}
# Trigger for travis-ci rebuild: tic
# Current version information: Date here should match the date in the YAML above.
# Remove .9000 tag and set date to release date when releasing
version <- "0.5.0.9000"
date <- format(Sys.time(), '%B %d, %Y')
version <- "0.6.0"
date <- "August 7, 2019"
# Latest release information:
latest_release_version <- "0.5.0"
latest_release_date <- "February 24, 2019"
latest_release_version <- "0.6.0"
latest_release_date <- "August 7, 2019"
# Set output options
if(knitr:::is_html_output())
Expand All @@ -60,7 +58,7 @@ needed_CRAN_pkgs <- c(
# Explicitly used packages:
"tidyverse", "rmarkdown", "knitr", "janitor", "skimr",
"infer",
# "moderndive",
"moderndive",
# Internally used packages:
"webshot", "mvtnorm", "remotes", "devtools", "dygraphs", "gridExtra",
Expand All @@ -79,15 +77,10 @@ if(!"patchwork" %in% installed.packages()){
remotes::install_github("thomasp85/patchwork")
}
if(!"moderndive" %in% installed.packages()){
# To be included until new version with gg_parallel_slopes() is on CRAN
remotes::install_github("moderndive/moderndive")
}
# Less than ideal, but have to call in a PR here to label the table
# Just used a screenshot instead
#remotes::install_github("rstudio/gt",
# ref="eff3be7384365a44459691e49b9b740420cd0851")
# if(!"moderndive" %in% installed.packages()){
# # To be included until new version with gg_parallel_slopes() is on CRAN
# remotes::install_github("moderndive/moderndive")
# }
# Check that phantomjs is installed to create screenshots of apps
if(is.null(webshot:::find_phantom()))
Expand Down Expand Up @@ -223,13 +216,11 @@ if(knitr::is_html_output()){
```
-->


<!-- include=FALSE for PDF -->

```{block, type='learncheck', include=FALSE, purl=FALSE}
<!--
```{block, type='learncheck', include = !knitr:::is_latex_output(), purl=FALSE}
**Please note that you are currently looking at the "development version" of ModernDive, which is a work in progress currently being edited and thus subject to frequent change. For the latest "released version" of ModernDive, which changes much less frequently, please visit [ModernDive.com](https://moderndive.com/).**
```

-->


**Help! I'm new to R and RStudio and I need to learn about them! However, I'm completely new to coding! What do I do?**
Expand Down Expand Up @@ -456,6 +447,7 @@ This book was written using RStudio's [bookdown](https://bookdown.org/) \index{b
+ Preview of development version is available at [https://moderndive.netlify.com/](https://moderndive.netlify.com/)
+ Source code: Available on ModernDive's [GitHub repository page](https://github.com/moderndive/moderndive_book)
* **Previous versions** Older versions that may be out of date:
+ [Version 0.5.0](previous_versions/v0.5.0/index.html) released on February 24, 2019 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.5.0))
+ [Version 0.4.0](previous_versions/v0.4.0/index.html) released on July 21, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0))
+ [Version 0.3.0](previous_versions/v0.3.0/index.html) released on February 3, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0))
+ [Version 0.2.0](previous_versions/v0.2.0/index.html) released on August 02, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0))
Expand Down
Binary file modified moderndive.pdf
Binary file not shown.
Loading

0 comments on commit e3a744a

Please sign in to comment.