Merge pull request #181 from moderndive/dev-to-release-v0.6.0

Dev to release v0.6.0
moderndive · Aug 7, 2019 · e3a744a · e3a744a
2 parents 0cbc34a + e25ebe0
commit e3a744a
Show file tree

Hide file tree

Showing 411 changed files with 47,724 additions and 180 deletions.
diff --git a/.gitignore b/.gitignore
@@ -13,8 +13,10 @@ ModernDive\ Master.Rproj
 *_bookdown_files
 ismaykim.log
 ismaykim.Rmd
+ismaykim.tex
 moderndive.log
 moderndive.Rmd
+moderndive.tex
 bib/packages.bib
 purl.Rout
 docs/*

diff --git a/.travis.yml b/.travis.yml
@@ -5,9 +5,9 @@ pandoc_version: 1.19.2.1
 r_packages:
   - devtools
 
-r_github_packages:
-  - moderndive/moderndive
-  - tidymodels/infer
+#r_github_packages:
+#  - moderndive/moderndive
+#  - tidymodels/infer
 
 before_script:
   - chmod +x ./_build.sh

diff --git a/NEWS.md b/NEWS.md
@@ -1,161 +1,28 @@
-# ModernDive 0.5.0.9000
-
-## Major refactoring of inference chapters of book
-
-**Old Chapter Structure**:
-
-* Chapter 9 - Confidence Intervals
-    1. Bootstrapping
-        a) Data explanation
-        b) Exploratory data analysis
-        c) The Bootstrapping Process
-    2. The infer package for statistical inference
-        a) Specify variables
-        b) Generate replicates
-        c) Calculate summary statistics
-        d) Visualize the results
-    3. Now to confidence intervals
-        a) The percentile method
-        b) The standard error method
-    4. Comparing bootstrap and sampling distributions
-    5. Interpreting the confidence interval
-    6. Example: One proportion
-        a) Observed Statistic
-        b) Bootstrap distribution
-        c) Theory-based confidence intervals
-    7. Example: Comparing two proportions
-        a) Compute the point estimate
-        b) Bootstrap distribution
-    8. Conclusion
-        a) What’s to come?
-        b) Script of R code
-* Chapter 10 - Hypothesis Testing
-    1. When inference is not needed
-    2. Basics of hypothesis testing
-    3. Criminal trial analogy
-        a) Two possible conclusions
-    4. Types of errors in hypothesis testing
-        a) Logic of hypothesis testing
-    5. Statistical significance
-    6. Hypothesis testing with infer
-    7. Example: Comparing two means
-        a) Randomization/permutation
-        b) Comparing action and romance movies
-        c) Sampling -> randomization
-        d) Data
-        e) Model of H0
-        f) Test statistic delta
-        g) Observed effect delta*
-        h) Simulated data
-        i) Distribution of delta under H0
-        j) The p-value
-        k) Corresponding confidence interval
-        l) Summary
-    8. Building theory-based methods using computation
-        a) Example: t-test for two independent samples
-        b) Conditions for t-test
-    9. Conclusion
-        a) Script of R code
-* Chapter 11 - Inference for Regression
-    1. Simulation-based Inference for Regression
-    1. Bootstrapping for the regression slope
-    1. Inference for multiple regression
-        a) Refresher: Professor evaluations data
-        b) Refresher: Visualizations
-        c) Refresher: Regression tables
-        d) Script of R code
-
-
-**New Chapter Structure**:
-
-* Chapter 9 - Confidence Intervals
-1. Activity: Working with a sample of pennies from the bank. Are they representative of all pennies in the US?
-  - a) Question: What do I do when I only have one sample?
-  - b) Resampling once (paper slips)
-  - c) Resampling 33 times
-  - d) Diagrams in Keynote
-2. Computer simulation: 
-  - a) What is resampling?
-  - b) Resampling once
-  - c) Resampling 33 times
-  - d) Resampling 1000 times
-3. Goal: Generate an estimate that accounts for sampling variation
-  - a) Constructing a confidence interval: hide code to shade ci region and to get the actual values. 
-  - b) Constructing a CI using percentile method
-  - c) Constructing a CI using SE method
-4. Framework: Boostrap resampling with replacement
-  - a) What dplyr verbs did we use?
-  - b) There is only one test framework
-  - c) the infer package: make sure to draw parallels between dplyr code and infer verbs
-5. Interpretation: 
-  - a) 95% speaks to reliability of the process, not about an particular interval. "We are 95% confident"
-  - b) What determines the width? Sample size, confidence levels (only int at population variance)
-6. Case study: Comparing two proportions with Mythbusters data
-7. Big picture: 
-  - a) Does this even work? Comparing sampling and bootstrap distribution. Do this using balls. 
-  - b) Table of inferential scenarios: Add pennies (mu) and Mythbusters (p1 - p2)
-  - c) Why does this work? Theoretical result: Efron. The empirical CDF converges to the population CDF. Bootstrap works for any point estimate
-  - d) There's a formula for that! Margin of error using critical values z. Talk about normal distributions. 
-* Chapter 10 - Hypothesis Testing
-1. Activity: Shuffling resumes between male and female job applicants
-  - a) Question: Are men and women rated for jobs differently?
-  - b) Alternate universe: No difference
-  - c) What about sampling variation?
-  - d) What did we actually observe?
-  - e) How likely is this result?
-  - f) Diagrams in Keynote
-2. Extension of previous framework/infer
-  - a) Revisit verb framework
-  - b) Permutation test resampling w/o replacement
-  - c) There is only one test framework
-  - d) Do activity via infer package
-3. Goal: Choose between two possible truths while accounting for sampling variation
-  - a) Conducting a hypothesis test
-  - b) Null hypothesis that's assumed
-  - c) Null distribution of test statistics: A "alternate universe" distribution
-  - d) Observed test statistics
-  - e) Definition of p-value
-4. Interpretation: 
-  - a) Analogy of criminal justice system
-  - b) Types of errors: 2x2 table
-  - c) A yes/no-type decision: statistical significance via alpha
-5. Case study: Comparing two means with action vs romance movie data
-  - Use the "There is Only One Test" framework here
-6. Conclusion 
-  - a) When is inference not needed: EDA can solve the problem. 
-  - b) Problems with p-values: p-hacking, hard to understand, ASA statement
-  - c) Comparison with confidence intervals. HT yields binary decision, but CI's yield plausible range of estimates. This is statistical vs practical significance
-  - d) Table of inferential scenarios: Add action vs romance (mu1 - mu2)
-  - e) Why does this work? Theoretical result: Neyman-Pearson lemma (maybe)
-  - f) There's a formula for that! t-test. Draw a null distribution with t-distribution superimposed. 
-* Chapter 11 - Inference for Regression
-    1. Activity: Revisit simple linear regression
-        a) Question: Is there a significant relationship between teaching score and bty score above and beyond any evidence due to sampling variation.
-        b) Review exercise/re-run all code
-        c) Regression table
-    1. Computer simulation: 
-        a) Permuting the relationship: to do a hypothesis test assuming independence of y & x. 
-        a) Bootstraping the rows: Having done HT, generate confidence interval.
-    1. Goal: Inferring about the population regression slope
-    1. Framework: 
-    1. Interpretation:
-        a) "You don't have to do any of this! Values in table are given!" No simulations necessary!
-        b) Conditions for inference: residual and partial residual plots, assumption of indepdence. 
-    1. Case study: Multiple regression example from Ch 7.
-    1. Big picture: 
-        a) ANOVA = Regression with categorical variables
-        b) Table of inferential scenarios: Add (beta1)
-        c) Why does this work?
-        d) There's a formula for that! Fitted intercept and slope. SE of fitted intercept and slope: observe there is a sqrt(n) in denominator. 
-
+# ModernDive 0.6.0
 
+## Done first pass of infer chapters
 
-## All content changes
+Completed major re-organization and clean-up of Chapters 9-11 using the `infer` package for "tidy and transparent" statistcal inference.
+
+* Chapter 9: Bootstrapping & confidence intervals
+    + Tactile exercise of sampling 50 pennies from bank and resampling from this sample.
+    + Added sections on
+        1. "Interpreting confidence intervals", in particular determinants of CI width.
+        1. "Theory-based confidence intervals" using formula for SE of p-hat, thereby bridging gap between simulation and theory-based methods.
+* Chapter 10: Hypothesis testing
+    + Added `promotions` example on gender discrimination in promotions at a bank. Data source: `openintro::gender.discrimination`
+    + Added section on "Theory-based hypothesis tests" using t-test, thereby bridging gap between simulation and theory-based methods.
+* Chapter 11: Inference for regression.
+    + Discussion on LINE conditions for inference. In particular using `moderndive::get_regression_points()` wrapper function to `broom::augment()` so that novices can do their own residual analyses.
+
+## Other changes
+
+* Chapter 7: Multiple regression
+    + Added Section 7.3.1 on model selection: choosing between "interaction" and "parallel slopes" models
+* Chapter 8: Sampling
+    + Added Section 8.5.3 with more in-depth discussion of normal distribution
+* Chapter 12: Renamed to "Tell your story with data"
 
-* Chapter 6 - Basic regression:
-    + Changed `skimr::skim()` outputs to be of type console.
-    + Shortened simple linear regression EDA, in particular `geom_jitter()` and `geom_smooth(se = FALSE)`
-    + Expanded on "least squares" criteria for "best" fitting line in 6.3.3
 
 
 ***

diff --git a/index.Rmd b/index.Rmd
@@ -2,7 +2,7 @@
 title: "Statistical Inference via Data Science"
 subtitle: "A moderndive into R and the tidyverse"
 author: "Chester Ismay and Albert Y. Kim"
-date: "`r format(Sys.time(), '%B %d, %Y')`"
+date: "August 7, 2019"
 site: bookdown::bookdown_site
 documentclass: krantz
 bibliography: [bib/books.bib, bib/packages.bib, bib/articles.bib]
@@ -28,16 +28,14 @@ favicon: "images/logos/favicons/favicon.ico"
 \mainmatter
 
 ```{r set-options, include=FALSE}
-# Trigger for travis-ci rebuild: tic
-
 # Current version information: Date here should match the date in the YAML above.
 # Remove .9000 tag and set date to release date when releasing
-version <- "0.5.0.9000"
-date <- format(Sys.time(), '%B %d, %Y')
+version <- "0.6.0"
+date <- "August 7, 2019"
 
 # Latest release information:
-latest_release_version <- "0.5.0"
-latest_release_date <- "February 24, 2019"
+latest_release_version <- "0.6.0"
+latest_release_date <- "August 7, 2019"
 
 # Set output options
 if(knitr:::is_html_output())
@@ -60,7 +58,7 @@ needed_CRAN_pkgs <- c(
   # Explicitly used packages:
   "tidyverse", "rmarkdown", "knitr", "janitor", "skimr",
   "infer", 
-#  "moderndive",
+  "moderndive",
   
   # Internally used packages:
   "webshot", "mvtnorm", "remotes", "devtools", "dygraphs", "gridExtra",
@@ -79,15 +77,10 @@ if(!"patchwork" %in% installed.packages()){
   remotes::install_github("thomasp85/patchwork")
 }
 
-if(!"moderndive" %in% installed.packages()){
-  # To be included until new version with gg_parallel_slopes() is on CRAN
-  remotes::install_github("moderndive/moderndive")
-}
-
-# Less than ideal, but have to call in a PR here to label the table
-# Just used a screenshot instead
-#remotes::install_github("rstudio/gt", 
-#                        ref="eff3be7384365a44459691e49b9b740420cd0851")
+# if(!"moderndive" %in% installed.packages()){
+#   # To be included until new version with gg_parallel_slopes() is on CRAN
+#   remotes::install_github("moderndive/moderndive")
+# }
 
 # Check that phantomjs is installed to create screenshots of apps
 if(is.null(webshot:::find_phantom()))
@@ -223,13 +216,11 @@ if(knitr::is_html_output()){
 ```
 -->
 
-
-<!-- include=FALSE for PDF -->
-
-```{block, type='learncheck', include=FALSE, purl=FALSE}
+<!--
+```{block, type='learncheck', include = !knitr:::is_latex_output(), purl=FALSE}
 **Please note that you are currently looking at the "development version" of ModernDive, which is a work in progress currently being edited and thus subject to frequent change. For the latest "released version" of ModernDive, which changes much less frequently, please visit [ModernDive.com](https://moderndive.com/).**
 ```
-
+-->
 
 
 **Help! I'm new to R and RStudio and I need to learn about them! However, I'm completely new to coding! What do I do?** 
@@ -456,6 +447,7 @@ This book was written using RStudio's [bookdown](https://bookdown.org/) \index{b
     + Preview of development version is available at [https://moderndive.netlify.com/](https://moderndive.netlify.com/)
     + Source code: Available on ModernDive's [GitHub repository page](https://github.com/moderndive/moderndive_book)
 * **Previous versions** Older versions that may be out of date:
+    + [Version 0.5.0](previous_versions/v0.5.0/index.html) released on February 24, 2019 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.5.0))
     + [Version 0.4.0](previous_versions/v0.4.0/index.html) released on July 21, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.4.0))
     + [Version 0.3.0](previous_versions/v0.3.0/index.html) released on February 3, 2018 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.3.0))
     + [Version 0.2.0](previous_versions/v0.2.0/index.html) released on August 02, 2017 ([source code](https://github.com/moderndive/moderndive_book/releases/tag/v0.2.0))

diff --git a/moderndive.pdf b/moderndive.pdf