From cd806d18d17b097df0fa9f12af07789cd2dfae04 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 6 Sep 2023 12:54:54 +0200
Subject: [PATCH 01/13] Update index.Rmd
---
index.Rmd | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/index.Rmd b/index.Rmd
index 0d07b1b4..d90679c6 100755
--- a/index.Rmd
+++ b/index.Rmd
@@ -193,6 +193,12 @@ Of course, all of the commands presented above also work in interactive widgets
write_html(playground = T)
```
+```{r, eval = my_output == "latex", results='asis', echo=FALSE, purl=FALSE}
+cat('\\begin{center}
+\\textit{This interactive application is only available in the HTML version.}
+\\end{center}
+')
+```
From 6e3a5eca8e58c9ed85921564d9d84f84a6ed27c4 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 6 Sep 2023 12:56:45 +0200
Subject: [PATCH 02/13] Update 02-ch2.Rmd
---
02-ch2.Rmd | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/02-ch2.Rmd b/02-ch2.Rmd
index 783f96cd..9f7899a8 100644
--- a/02-ch2.Rmd
+++ b/02-ch2.Rmd
@@ -54,7 +54,8 @@ probability <- rep(1/6, 6)
plot(probability,
xlab = "Outcomes",
ylab="Probability",
- main = "Probability Distribution")
+ main = "Probability Distribution",
+ pch=20)
```
For the cumulative probability distribution we need the cumulative probabilities, i.e., we need the cumulative sums of the vector `r ttcode("probability")`. These sums can be computed using `r ttcode("cumsum()")`.
@@ -67,7 +68,8 @@ cum_probability <- cumsum(probability)
plot(cum_probability,
xlab = "Outcomes",
ylab="Cumulative Probability",
- main = "Cumulative Probability Distribution")
+ main = "Cumulative Probability Distribution",
+ pch=20)
```
### Bernoulli Trials {-}
@@ -143,7 +145,8 @@ probability <- dbinom(x = k,
plot(x = k,
y = probability,
ylab="Probability",
- main = "Probability Distribution Function")
+ main = "Probability Distribution Function",
+ pch=20)
```
In a similar fashion we may plot the cumulative distribution function of $k$ by
@@ -159,7 +162,8 @@ prob <- pbinom(q = k,
plot(x = k,
y = prob,
ylab="Probability",
- main = "Cumulative Distribution Function")
+ main = "Cumulative Distribution Function",
+ pch=20)
```
### Expected Value, Mean and Variance {-}
From d770b55ae28daab9d26371c4b3d6dce55e94f068 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 6 Sep 2023 13:07:12 +0200
Subject: [PATCH 03/13] Update 04-ch4.Rmd
---
04-ch4.Rmd | 42 +++++++++++++++++++-----------------------
1 file changed, 19 insertions(+), 23 deletions(-)
diff --git a/04-ch4.Rmd b/04-ch4.Rmd
index a8cc3b6a..cb8338a4 100644
--- a/04-ch4.Rmd
+++ b/04-ch4.Rmd
@@ -70,7 +70,7 @@ The following code reproduces Figure 4.1 from the textbook.
```{r , echo=TRUE, fig.align='center', cache=TRUE}
# create a scatterplot of the data
-plot(TestScore ~ STR,ylab="Test Score")
+plot(TestScore ~ STR,ylab="Test Score",pch=20)
# add the systematic relationship to the plot
abline(a = 713, b = -3)
@@ -744,7 +744,7 @@ curve(dnorm(x,
-2,
sqrt(var_b0)),
add = T,
- col = "darkred")
+ col = "darkred",lwd=2)
# plot histograms of beta_hat_1
hist(fit[, 2],
@@ -758,7 +758,7 @@ curve(dnorm(x,
3.5,
sqrt(var_b1)),
add = T,
- col = "darkred")
+ col = "darkred",lwd=2)
```
@@ -820,24 +820,7 @@ and
$$Cov(X,Y)=4.$$
Formally, this is written down as
-
-\begin{equation}
- \begin{pmatrix}
- X \\
- Y \\
- \end{pmatrix}
- \overset{i.i.d.}{\sim} \ \mathcal{N}
- \left[
- \begin{pmatrix}
- 5 \\
- 5 \\
- \end{pmatrix}, \
- \begin{pmatrix}
- 5 & 4 \\
- 4 & 5 \\
- \end{pmatrix}
- \right]. \tag{4.3}
-\end{equation}
+$$\begin{pmatrix} X \\ Y \end{pmatrix}\overset{i.i.d.}{\sim} \ \mathcal{N}\left[\begin{pmatrix} 5 \\ 5 \end{pmatrix}, \begin{pmatrix} 5 & 4 \\ 4 & 5 \end{pmatrix} \right].\tag{4.3} $$
To carry out the random sampling, we make use of the function `r ttcode("mvrnorm()")` from the package `r ttcode("MASS")` [@R-MASS] which allows to draw random samples from multivariate normal distributions, see `?mvtnorm`. Next, we use `r ttcode("subset()")` to split the sample into two subsets such that the first set, `r ttcode("set1")`, consists of observations that fulfill the condition $\lvert X - \overline{X} \rvert > 1$ and the second set, `r ttcode("set2")`, includes the remainder of the sample. We then plot both sets and use different colors to distinguish the observations.
@@ -871,6 +854,12 @@ plot(set1,
points(set2,
col = "steelblue",
pch = 19)
+legend("topleft",
+ legend = c("Set1",
+ "Set2"),
+ cex = 1,
+ pch = 19,
+ col = c("black","steelblue"))
```
@@ -887,8 +876,15 @@ plot(set1, xlab = "X", ylab = "Y", pch = 19)
points(set2, col = "steelblue", pch = 19)
# add both lines to the plot
-abline(lm.set1, col = "green")
-abline(lm.set2, col = "red")
+abline(lm.set1, col = "black",lwd=2)
+abline(lm.set2, col = "steelblue",lwd=2)
+legend("bottomright",
+ legend = c("Set1",
+ "Set2"),
+ cex = 1,
+ lwd=2,
+ col = c("black","steelblue"))
+
```
From 3ab46f857bb4cbb4dfbecfd602f908a100765fa9 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:04:33 +0200
Subject: [PATCH 04/13] Update 05-ch5.Rmd
---
05-ch5.Rmd | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/05-ch5.Rmd b/05-ch5.Rmd
index d76552c5..eb3a6cd3 100644
--- a/05-ch5.Rmd
+++ b/05-ch5.Rmd
@@ -1,4 +1,4 @@
-# Hypothesis Tests and Confidence Intervals in the Simple Linear Regression Model {#htaciitslrm}
+# Hypothesis Tests and Confidence Intervals in SLR Model {#htaciitslrm}
This chapter continues our treatment of the simple linear regression model. The following subsections discuss how we may use our knowledge about the sampling distribution of the OLS estimator in order to make statements regarding its uncertainty.
@@ -1312,8 +1312,8 @@ if (my_output=="html") {
In the simple regression model, the covariance matrix of the coefficient estimators is denoted
-\\begin{equation}
-\\text{Var}
+
+$$\\text{Var}
\\begin{pmatrix}
\\hat\\beta_0 \\
\\hat\\beta_1
@@ -1321,8 +1321,8 @@ In the simple regression model, the covariance matrix of the coefficient estimat
\\begin{pmatrix}
\\text{Var}(\\hat\\beta_0) & \\text{Cov}(\\hat\\beta_0,\\hat\\beta_1) \\\\
\\text{Cov}(\\hat\\beta_0,\\hat\\beta_1) & \\text{Var}(\\hat\\beta_1)
-\\end{pmatrix}
-\\end{equation}
+\\end{pmatrix}$$
+
The function vcovHC can be used to obtain estimates of this matrix for a model object of interest.
@@ -1448,4 +1448,3 @@ The function DGP_OLS() and the estimated variance est_var_OLS
')}
```
-
From d2af4e0d6e2e584a3c31e8c5998eb398441aead7 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:05:50 +0200
Subject: [PATCH 05/13] Update 06-ch6.Rmd
---
06-ch6.Rmd | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/06-ch6.Rmd b/06-ch6.Rmd
index d8d17520..5fa33a96 100644
--- a/06-ch6.Rmd
+++ b/06-ch6.Rmd
@@ -635,7 +635,8 @@ persp(kde,
phi = 30,
xlab = "beta_1",
ylab = "beta_2",
- zlab = "Est. Density")
+ zlab = "Est. Density",
+ main = "2D Kernel Density Estimate")
```
From the plot above we can see that the density estimate has some similarity to a bivariate normal distribution (see Chapter \@ref(pt)) though it is not very pretty and probably a little rough. Furthermore, there is a correlation between the estimates such that $\rho\neq0$ in (2.1). Also, the distribution's shape deviates from the symmetric bell shape of the bivariate standard normal distribution and has an elliptical surface area instead.
@@ -816,5 +817,3 @@ Next, access the $\\bar{R}^2$ of this model.
')
```
-
-
From 3f4e364e1b6caaa2cdfdfe7b2b2caa9b76d8a354 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:07:01 +0200
Subject: [PATCH 06/13] Update 07-ch7.Rmd
---
07-ch7.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/07-ch7.Rmd b/07-ch7.Rmd
index 7d596d29..d92c953d 100644
--- a/07-ch7.Rmd
+++ b/07-ch7.Rmd
@@ -1,4 +1,4 @@
-# Hypothesis Tests and Confidence Intervals in Multiple Regression {#htaciimr}
+# Hypothesis Tests and Confidence Intervals in MR Models {#htaciimr}
This chapter discusses methods that allow to quantify the sampling uncertainty in the OLS estimator of the coefficients in multiple regression models. The basis for this are hypothesis tests and confidence intervals which, just as for the simple linear regression model, can be computed using basic `r ttcode("R")` functions. We will also tackle the issue of testing joint hypotheses on these coefficients.
From e55cbfefe3bf15c867b93fdf3d682ec746c24dbe Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:07:54 +0200
Subject: [PATCH 07/13] Update 08-ch8.Rmd
---
08-ch8.Rmd | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)
diff --git a/08-ch8.Rmd b/08-ch8.Rmd
index 36fa24d3..e5fc7d0f 100644
--- a/08-ch8.Rmd
+++ b/08-ch8.Rmd
@@ -97,7 +97,8 @@ lines(x = CASchools$income[order_id],
y = fitted(quadratic_model)[order_id],
col = "red",
lwd = 2)
-legend("bottomright",legend=c("Linear Line","Quadratic Line"),lwd=2,col=c("green","red"))
+legend("bottomright",legend=c("Linear Line","Quadratic Line"),
+ lwd=2,col=c("green","red"))
```
@@ -352,6 +353,7 @@ lines(CASchools$income[order_id],
fitted(LinearLog_model)[order_id],
col = "red",
lwd = 2)
+legend("bottomright",legend = "Linear-log line",lwd = 2,col ="red")
```
@@ -1260,23 +1262,26 @@ The considered model specifications are:
```{r, tidy=TRUE}
# estimate all models
-TestScore_mod1 <- lm(score ~ size + english + lunch, data = CASchools)
+TestScore_mod1 <- lm(score ~ size + english + lunch,
+ data = CASchools)
-TestScore_mod2 <- lm(score ~ size + english + lunch + log(income), data = CASchools)
+TestScore_mod2 <- lm(score ~ size + english + lunch + log(income),
+ data = CASchools)
-TestScore_mod3 <- lm(score ~ size + HiEL + HiEL:size, data = CASchools)
+TestScore_mod3 <- lm(score ~ size + HiEL + HiEL:size,
+ data = CASchools)
TestScore_mod4 <- lm(score ~ size + HiEL + HiEL:size + lunch + log(income),
- data = CASchools)
+ data = CASchools)
-TestScore_mod5 <- lm(score ~ size + I(size^2) + I(size^3) + HiEL + lunch +
- log(income),data = CASchools)
+TestScore_mod5 <- lm(score ~ size + I(size^2) + I(size^3) + HiEL +lunch
+ + log(income),data = CASchools)
-TestScore_mod6 <- lm(score ~ size + I(size^2) + I(size^3) + HiEL + HiEL:size +
- HiEL:I(size^2) + HiEL:I(size^3) + lunch + log(income), data = CASchools)
+TestScore_mod6 <- lm(score ~ size + I(size^2) + I(size^3) + HiEL + HiEL:size
+ +HiEL:I(size^2) + HiEL:I(size^3) + lunch + log(income),data = CASchools)
-TestScore_mod7 <- lm(score ~ size + I(size^2) + I(size^3) + english + lunch +
- log(income), data = CASchools)
+TestScore_mod7 <- lm(score ~ size + I(size^2) + I(size^3) + english
+ +lunch + log(income),data = CASchools)
```
We may use `r ttcode("summary()")` to assess the models' fit. Using `r ttcode("stargazer()")` we may also obtain a tabular representation of all regression outputs and which is more convenient for comparison of the models.
From 89d6e842726ebb8722e3be6c83b0442bf8e527a7 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:09:35 +0200
Subject: [PATCH 08/13] Update 09-ch9.Rmd
---
09-ch9.Rmd | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/09-ch9.Rmd b/09-ch9.Rmd
index 915a972a..dab71b60 100644
--- a/09-ch9.Rmd
+++ b/09-ch9.Rmd
@@ -208,6 +208,12 @@ plot(X, Y,
abline(ms_mod,
col = "red",
lwd = 2)
+legend("bottomright",
+ bg = "transparent",
+ cex = 0.8,
+ lwd = 2,
+ col ="red",
+ legend = "Linear Regression Line")
```
@@ -480,7 +486,8 @@ legend("topleft",
bg = "transparent",
cex = 0.8,
col = c("darkgreen", "black", "purple"),
- legend = c("Population", "Full sample",expression(paste("Obs.with ", X <= 45))))
+ legend = c("Population", "Full sample",expression(paste("Obs.with ",
+ X <= 45))))
```
@@ -537,7 +544,8 @@ legend("bottomright",
bg = "transparent",
cex = 0.8,
col = c("darkgreen", "black", "purple"),
- legend = c("Population", "Full sample",expression(paste(X <= 55,"&",Y >= 100))))
+ legend = c("Population", "Full sample",expression(paste(X <= 55,"&",
+ Y >= 100))))
```
From 4b0a19efa55a40a8a5a56157d39e5fc1443891a5 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:14:58 +0200
Subject: [PATCH 09/13] Update 11-ch11.Rmd
---
11-ch11.Rmd | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/11-ch11.Rmd b/11-ch11.Rmd
index 4642e0fe..447ef5e2 100644
--- a/11-ch11.Rmd
+++ b/11-ch11.Rmd
@@ -392,7 +392,7 @@ The subsequent code chunk reproduces Figure 11.3 of the book.
# plot data
plot(x = HMDA$pirat,
y = HMDA$deny,
- main = "Probit and Logit Models Model of the Probability of Denial, Given P/I Ratio",
+ main = "Probit and Logit Models of the Probability of Denial, Given P/I Ratio",
xlab = "P/I ratio",
ylab = "Deny",
pch = 20,
From 4102b86b32f30dd5ef5ff3c727638023667a567b Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:16:21 +0200
Subject: [PATCH 10/13] Update 13-ch13.Rmd
---
13-ch13.Rmd | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/13-ch13.Rmd b/13-ch13.Rmd
index d6db3d66..7aefc2a5 100644
--- a/13-ch13.Rmd
+++ b/13-ch13.Rmd
@@ -790,8 +790,8 @@ plot(d$W, d$Y,
# add a dashed vertical line at cutoff
abline(v = 0, lty = 2)
#add legend
-legend("topleft",pch=20,col=c("steelblue","darkred"),legend=c("Do not receive treatment ",
-"Receive treatment"))
+legend("topleft",pch=20,col=c("steelblue","darkred"),
+ legend=c("Do not receive treatment","Receive treatment"))
```
From 41b7207498a852d74843fa7fba0b13c576325461 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:17:21 +0200
Subject: [PATCH 11/13] Update 14-ch14.Rmd
---
14-ch14.Rmd | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/14-ch14.Rmd b/14-ch14.Rmd
index 2701875d..f1306ad6 100644
--- a/14-ch14.Rmd
+++ b/14-ch14.Rmd
@@ -1665,15 +1665,16 @@ plot(Fstatsseries,
col = "steelblue",
ylab = "F-Statistic",
xlab = "Break Date",
- main = "Testing for a Break in GDP ADL(2,2) Regression at Different Dates")
+ main = "Testing for a Break in GDP ADL(2,2) Regression at Different Dates",
+ cex.main=0.8)
# dashed horizontal lines for critical values and QLR statistic
abline(h = 4.71, lty = 2)
abline(h = 6.02, lty = 2)
segments(0, QLR, 1980.75, QLR, col = "darkred")
-text(2010, 6.2, "1% Critical Value")
-text(2010, 4.9, "5% Critical Value")
-text(1980.75, QLR+0.2, "QLR Statistic")
+text(2010, 6.2, "1% Critical Value",cex=0.8)
+text(2010, 4.9, "5% Critical Value",cex=0.8)
+text(1980.75, QLR+0.2, "QLR Statistic",cex=0.8)
```
From 014cc3f030692dd232234ac7591d55261cfb5e69 Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Wed, 18 Oct 2023 12:18:04 +0200
Subject: [PATCH 12/13] Update 15-ch15.Rmd
---
15-ch15.Rmd | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/15-ch15.Rmd b/15-ch15.Rmd
index c7e0cb75..cb4d9314 100644
--- a/15-ch15.Rmd
+++ b/15-ch15.Rmd
@@ -703,7 +703,8 @@ plot(0:18, point_estimates[-1],
ylim = c(-0.4, 1.6),
xlab = "Lag",
ylab = "Cumulative dynamic multiplier",
- main = "Cumulative Dynamic Effect of FDD on Orange Juice Price")
+ main = "Cumulative Dynamic Effect of FDD on Orange Juice Price",
+ cex.main=0.8)
# add dashed line at 0
abline(h = 0, lty = 2)
From 6347c179f62d2cba0ecff87e4cea619ad0f2a79a Mon Sep 17 00:00:00 2001
From: Ocalak <96614838+Ocalak@users.noreply.github.com>
Date: Tue, 7 Nov 2023 16:29:29 +0100
Subject: [PATCH 13/13] Add files via upload
---
02-ch2.Rmd | 14 +++++++-------
03-ch3.Rmd | 12 ++++++------
04-ch4.Rmd | 2 +-
06-ch6.Rmd | 2 +-
07-ch7.Rmd | 6 +++---
08-ch8.Rmd | 10 +++++-----
14-ch14.Rmd | 2 +-
index.Rmd | 2 +-
8 files changed, 25 insertions(+), 25 deletions(-)
diff --git a/02-ch2.Rmd b/02-ch2.Rmd
index 9f7899a8..b092892a 100644
--- a/02-ch2.Rmd
+++ b/02-ch2.Rmd
@@ -3,9 +3,9 @@
This chapter reviews some basic concepts of probability theory and demonstrates how they can be
applied in `r ttcode("R")`.
-Most of the statistical functionalities in base `r ttcode("R")` are collected in the `r ttcode("stats")` package. It provides simple functions which compute descriptive measures and facilitate computations involving a variety of probability distributions. It also contains more sophisticated routines that, enable the user to estimate a large number of models based on the same data or help to conduct extensive simulation studies. `r ttcode("stats")` is part of the base distribution of `r ttcode("R")`, meaning that it is installed by default so there is no need to run `install.packages("stats")` or `library("stats")`. Simply execute `library(help = "stats")` in the console to view the documentation and a complete list of all functions gathered in `r ttcode("stats")`. For most packages a documentation that can be viewed within *RStudio* is available. Documentations can be invoked using the `r ttcode("?")` operator, e.g., upon execution of `?stats` the documentation of the `r ttcode("stats")` package is shown in the help tab of the bottom-right pane.
+Most of the statistical functionalities in base `r ttcode("R")` are collected in the `r ttcode("stats")` package. It provides simple functions which compute descriptive measures and facilitate computations involving a variety of probability distributions. It also contains more sophisticated routines that, enable the user to estimate a large number of models based on the same data or help to conduct extensive simulation studies. `r ttcode("stats")` is part of the base distribution of `r ttcode("R")`, meaning that it is installed by default so there is no need to run `install.packages("stats")` or `library("stats")`. Simply execute `library(help = "stats")` in the console to view the documentation and a complete list of all functions gathered in `r ttcode("stats")`. For most packages a documentation that can be viewed within *RStudio* is available. Documentation can be invoked using the `r ttcode("?")` operator, for example, upon execution of `?stats` the documentation of the `r ttcode("stats")` package is shown in the help tab of the bottom-right pane.
-In what follows, our focus is on (some of) the probability distributions that are handled by `r ttcode("R")` and show how to use the relevant functions to solve simple problems. Thereby, we refresh some core concepts of probability theory. Among other things, you will learn how to draw random numbers, how to compute densities, probabilities, quantiles and alike. As we shall see, it is very convenient to rely on these routines.
+In what follows, our focus is on (some of) the probability distributions that are handled by `r ttcode("R")` and show how to use the relevant functions to solve simple problems. Afterwards, we will review some core concepts of probability theory. Among other things, you will learn how to draw random numbers, how to compute densities, probabilities, quantiles and alike. As we shall see, it is very convenient to do these computations in R.
## Random Variables and Probability Distributions
@@ -32,7 +32,7 @@ events, e.g., 'the observed outcome lies between $2$ and $5$'.
A basic function to draw random samples from a specified set of elements is the function `r ttcode("sample()")`, see `?sample`. We can use it to simulate the random outcome of a dice roll. Let's roll the dice!
```{r, echo = T, eval = T, message = F, warning = F}
-sample(1:6, 1)
+sample(1:6, size=1)
```
The probability distribution of a discrete random variable is the list of all possible values of the variable and their probabilities which sum to $1$. The cumulative probability distribution function gives the probability that the random variable is less than or equal to a particular value.
@@ -403,7 +403,7 @@ g <- function(x) x * f(x)
h <- function(x) x^2 * f(x)
```
-Next, we use `r ttcode("integrate()")` and set lower and upper limits of integration to $1$ and $\infty$ using arguments `r ttcode("lower")` and `r ttcode("upper")`. By default, `r ttcode("integrate()")` prints the result along with an estimate of the approximation error to the console. However, the outcome is not a numeric value one can readily do further calculation with. In order to get only a numeric value of the integral, we need to use the `r ttcode("\\$")` operator in conjunction with `r ttcode("value")`. The `r ttcode("\\$")` operator is used to extract elements by name from an object of type `r ttcode("list")`.
+Next, we use `r ttcode("integrate()")` and set lower and upper limits of integration to $1$ and $\infty$ using arguments `r ttcode("lower")` and `r ttcode("upper")`. By default, `r ttcode("integrate()")` prints the result along with an estimate of the approximation error to the console. However, the outcome is not a numeric value one can readily do further calculation with. In order to get only a numeric value of the integral, we need to use the `r ttcode("$")` operator in conjunction with `r ttcode("value")`. The `r ttcode("$")` operator is used to extract elements by name from an object of type `r ttcode("list")`.
```{r, echo = T, eval = T, message = F, warning = F}
# compute area under the density curve
@@ -442,7 +442,7 @@ Thus, for the normal distribution we have the `r ttcode("R")` functions `r ttcod
### The Normal Distribution {-}
-The probably most important probability distribution considered here is the normal
+Perhaps the most important probability distribution considered here is the normal
distribution. This is not least due to the special role of the standard normal distribution and the Central Limit Theorem which is to be treated shortly. Normal distributions are symmetric and bell-shaped. A normal distribution is characterized by its mean $\mu$ and its standard deviation $\sigma$, concisely expressed by
$\mathcal{N}(\mu,\sigma^2)$. The normal distribution has the PDF
@@ -811,7 +811,7 @@ To clarify the basic idea of random sampling, let us jump back to the dice rolli
Suppose we are rolling the dice $n$ times. This means we are interested in the outcomes of random $Y_i, \ i=1,...,n$ which are characterized by the same distribution. Since these outcomes are selected randomly, they are *random variables* themselves and their realizations will differ each time we draw a sample, i.e., each time we roll the dice $n$ times. Furthermore, each observation is randomly drawn from the same population, that is, the numbers from $1$ to $6$, and their individual distribution is the same. Hence $Y_1,\dots,Y_n$ are identically distributed.
-Moreover, we know that the value of any of the $Y_i$ does not provide any information on the remainder of the outcomes In our example, rolling a six as the first observation in our sample does not alter the distributions of $Y_2,\dots,Y_n$: all numbers are equally likely to occur. This means that all $Y_i$ are also independently distributed. Thus $Y_1,\dots,Y_n$ are independently and identically distributed (*i.i.d.*).
+Moreover, we know that the value of any of the $Y_i$ does not provide any information on the remainder of the outcomes. In our example, rolling a six as the first observation in our sample does not alter the distributions of $Y_2,\dots,Y_n$: all numbers are equally likely to occur. This means that all $Y_i$ are also independently distributed. Thus $Y_1,\dots,Y_n$ are independently and identically distributed (*i.i.d.*).
The dice example uses this most simple sampling scheme. That is why it is called *simple random sampling*. This concept is summarized in Key Concept 2.5.
```{r, eval = my_output == "html", results='asis', echo=FALSE, purl=FALSE}
@@ -1208,7 +1208,7 @@ In `r ttcode("R")`, realize this as follows:
3. Next, we combine two `r ttcode("for()")` loops to simulate the data and plot the distributions. The inner loop generates $10000$ random samples, each consisting of `r ttcode("n")` observations that are drawn from the Bernoulli distribution, and computes the standardized averages. The outer loop executes the inner loop for the different sample sizes `r ttcode("n")` and produces a plot for each iteration.
-```{r, echo = T, eval = T, message = F, warning = F, cache=T, fig.align='center'}
+```{r, echo = T, eval = T, message = F, warning = F, cache=T, fig.align='center',fig.width=10, fig.height=10}
# subdivide the plot panel into a 2-by-2 array
par(mfrow = c(2, 2))
diff --git a/03-ch3.Rmd b/03-ch3.Rmd
index 3435fa34..7c126e42 100644
--- a/03-ch3.Rmd
+++ b/03-ch3.Rmd
@@ -234,7 +234,7 @@ First, *all* sampling distributions (represented by the solid lines) are centere
Next, have a look at the spread of the sampling distributions. Several things are noteworthy:
-- The sampling distribution of $Y_1$ (green curve) tracks the density of the $\mathcal{N}(10,1)$ distribution (black dashed line) pretty closely. In fact, the sampling distribution of $Y_1$ is the $\mathcal{N}(10,1)$ distribution. This is less surprising if you keep in mind that the $Y_1$ estimator does nothing but reporting an observation that is randomly selected from a population with $\mathcal{N}(10,1)$ distribution. Hence, $Y_1 \sim \mathcal{N}(10,1)$. Note that this result does not depend on the sample size $n$: the sampling distribution of $Y_1$ *is always* the population distribution, no matter how large the sample is. $Y_1$ is a good a estimate of $\mu_Y$, but we can do better.
+- The sampling distribution of $Y_1$ (green curve) tracks the density of the $\mathcal{N}(10,1)$ distribution (black dashed line) pretty closely. In fact, the sampling distribution of $Y_1$ is the $\mathcal{N}(10,1)$ distribution. This is less surprising if you keep in mind that the $Y_1$ estimator does nothing but reporting an observation that is randomly selected from a population with $\mathcal{N}(10,1)$ distribution. Hence, $Y_1 \sim \mathcal{N}(10,1)$. Note that this result does not depend on the sample size $n$: the sampling distribution of $Y_1$ *is always* the population distribution, no matter how large the sample is. $Y_1$ is a good estimate of $\mu_Y$, but we can do better.
- Both sampling distributions of $\overline{Y}$ show less dispersion than the sampling distribution of $Y_1$. This means that $\overline{Y}$ has a lower variance than $Y_1$. In view of Key Concepts 3.2 and 3.3, we find that $\overline{Y}$ is a more efficient estimator than $Y_1$. In fact, this holds for all $n>1$.
@@ -427,9 +427,9 @@ curve(dnorm(x),
axis(1,
at = c(-1.5, 0, 1.5),
padj = 0.75,
- labels = c(expression(-frac(bar(Y)^"act"~-~bar(mu)[Y,0], sigma[bar(Y)])),
+ labels = c(expression(-frac(bar(Y)^"act"~-~bar(mu)["Y,0"], sigma[bar(Y)])),
0,
- expression(frac(bar(Y)^"act"~-~bar(mu)[Y,0], sigma[bar(Y)]))))
+ expression(frac(bar(Y)^"act"~-~bar(mu)["Y,0"], sigma[bar(Y)]))))
# shade p-value/2 region in left tail
polygon(x = c(-6, seq(-6, -1.5, 0.01), -1.5),
@@ -612,7 +612,7 @@ tstatistic <- (samplemean_act - mean_h0) / SE_samplemean
tstatistic
```
-Using `r ttcode("R")` we can illustrate that if $\mu_{Y,0}$ equals the true value, that is, if the null hypothesis is true, \@ref(eq:tstat) is approximately $\mathcal{N}(0,1)$ distributed when $n$ is large.
+Using `r ttcode("R")` we can illustrate that if $\mu_{Y,0}$ equal the true value, that is, if the null hypothesis is true, \@ref(eq:tstat) is approximately $\mathcal{N}(0,1)$ distributed when $n$ is large.
```{r}
# prepare empty vector for t-statistics
@@ -929,7 +929,7 @@ $$ p\text{-value} = 2.2\cdot 10^{-16} \ll 0.05. $$
## Comparing Means from Different Populations {#cmfdp}
-Suppose you are interested in the means of two different populations, denote them $\mu_1$ and $\mu_2$. More specifically, you are interested whether these population means are different from each other and plan to use a hypothesis test to verify this on the basis of independent sample data from both populations. A suitable pair of hypotheses is
+Suppose you are interested in the means of two different populations, denote them $\mu_1$ and $\mu_2$. More specifically, you are interested in whether these population means are different from each other and plan to use a hypothesis test to verify this on the basis of independent sample data from both populations. A suitable pair of hypotheses is
\begin{equation}
H_0: \mu_1 - \mu_2 = d_0 \ \ \text{vs.} \ \ H_1: \mu_1 - \mu_2 \neq d_0 (\#eq:hypmeans)
@@ -1107,7 +1107,7 @@ The estimates indicate that $X$ and $Y$ are moderately correlated.
The next code chunk uses the function `r ttcode("mvnorm()")` from package `r ttcode("MASS")` [@R-MASS] to generate bivariate sample data with different degrees of correlation.
-```{r, fig.align='center'}
+```{r, fig.align='center',fig.width=8, fig.height=8}
library(MASS)
# set random seed
diff --git a/04-ch4.Rmd b/04-ch4.Rmd
index cb8338a4..ceb9d529 100644
--- a/04-ch4.Rmd
+++ b/04-ch4.Rmd
@@ -770,7 +770,7 @@ A further result implied by Key Concept 4.4 is that both estimators are consiste
Let us look at the distributions of $\beta_1$. The idea here is to add an additional call of `r ttcode("for()")` to the code. This is done in order to loop over the vector of sample sizes `r ttcode("n")`. For each of the sample sizes we carry out the same simulation as before but plot a density estimate for the outcomes of each iteration over `r ttcode("n")`. Notice that we have to change `r ttcode("n")` to `r ttcode("n[j]")` in the inner loop to ensure that the `r ttcode("j")`$^{th}$ element of `r ttcode("n")` is used. In the simulation, we use sample sizes of $100, 250, 1000$ and $3000$. Consequently we have a total of four distinct simulations using different sample sizes.
-```{r, fig.align='center', cache=T}
+```{r, fig.align='center', cache=T,fig.width=8, fig.height=8}
# set seed for reproducibility
set.seed(1)
diff --git a/06-ch6.Rmd b/06-ch6.Rmd
index 5fa33a96..d66f1ec2 100644
--- a/06-ch6.Rmd
+++ b/06-ch6.Rmd
@@ -13,7 +13,7 @@ library(MASS)
## Omitted Variable Bias
-The previous analysis of the relationship between test score and class size discussed in Chapters \@ref(lrwor) and \@ref(htaciitslrm) has a major flaw: we ignored other determinants of the dependent variable (test score) that correlate with the regressor (class size). Remember that influences on the dependent variable which are not captured by the model are collected in the error term, which we so far assumed to be uncorrelated with the regressor. However, this assumption is violated if we exclude determinants of the dependent variable which vary with the regressor. This might induce an estimation bias, i.e., the mean of the OLS estimator's sampling distribution is no longer equal to the true mean. In our example we therefore wrongly estimate the causal effect on test scores of a unit change in the student-teacher ratio, on average. This issue is called *omitted variable bias* (OVB) and is summarized by Key Concept 6.1.
+The previous analysis of the relationship between test score and class size discussed in Chapters \@ref(lrwor) and \@ref(htaciitslrm) has a major flaw: we ignored other determinants of the dependent variable (test score) that correlate with the regressor (class size). Remember that influences on the dependent variable which are not captured by the model are collected in the error term, which we so far assumed to be uncorrelated with the regressor. However, this assumption is violated if we exclude determinants of the dependent variable which vary with the regressor. This might induce an estimation bias, i.e., the mean of the OLS estimator's sampling distribution is no longer equals to the true mean. In our example we therefore wrongly estimate the causal effect on test scores of a unit change in the student-teacher ratio, on average. This issue is called *omitted variable bias* (OVB) and is summarized by Key Concept 6.1.
```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
diff --git a/07-ch7.Rmd b/07-ch7.Rmd
index d92c953d..7d19dcef 100644
--- a/07-ch7.Rmd
+++ b/07-ch7.Rmd
@@ -116,7 +116,7 @@ confint(model, level = 0.9)
The output now reports the desired $90\%$ confidence intervals for all coefficients.
-A disadvantage of `r ttcode("confint()")` is that it does not use robust standard errors to compute the confidence interval. For large-sample confidence intervals, this is quickly done manually as follows.
+One drawback of using `r ttcode("confint()")` is that it doesn't utilize robust standard errors for calculating the confidence interval. To create large-sample confidence intervals that account for robust standard errors, you can easily do manually using the following approach.
```{r, warning=F, message=F}
# compute robust standard errors
@@ -310,7 +310,7 @@ Omitted variable bias is the bias in the OLS estimator that arises when regresso
')
```
-We now discuss an example were we face a potential omitted variable bias in a multiple regression model:
+We will now discuss an example where we may encounter potential omitted variable bias in a multiple regression model:
Consider again the estimated regression equation
@@ -436,7 +436,7 @@ There is no unambiguous way to proceed when deciding which variable to use. In a
For a start, we plot student characteristics against test scores.
-```{r}
+```{r,fig.width=8, fig.height=6}
# set up arrangement of plots
m <- rbind(c(1, 2), c(3, 0))
graphics::layout(mat = m)
diff --git a/08-ch8.Rmd b/08-ch8.Rmd
index e5fc7d0f..39062bff 100644
--- a/08-ch8.Rmd
+++ b/08-ch8.Rmd
@@ -797,7 +797,7 @@ $$ Y_i = \\beta_0 + \\beta_1 X_i + \\beta_2 (X_i \\times D_i) + u_i $$
The following code chunk demonstrates how to replicate the results shown in Figure 8.8 of the book using artificial data.
-```{r, fig.align='center'}
+```{r, fig.align='center',fig.width=8, fig.height=8}
# generate artificial data
set.seed(1)
@@ -815,7 +815,7 @@ plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Different Intercepts, Same Slope",
- cex.main=0.9)
+ cex.main=1.2)
mod1_coef <- lm(log(Y) ~ X + D)$coefficients
@@ -832,7 +832,7 @@ plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Different Intercepts, Different Slopes",
- cex.main=0.9)
+ cex.main=1.2)
mod2_coef <- lm(log(Y) ~ X + D + X:D)$coefficients
@@ -849,7 +849,7 @@ plot(X, log(Y),
pch = 20,
col = "steelblue",
main = "Same Intercept, Different Slopes",
- cex.main=0.9)
+ cex.main=1.2)
mod3_coef <- lm(log(Y) ~ X + X:D)$coefficients
@@ -1163,7 +1163,7 @@ stargazer(Journals_mod1, Journals_mod2, Journals_mod3, Journals_mod4,
The subsequent code chunk reproduces Figure 8.9 of the book.
-```{r}
+```{r,fig.width=10, fig.height=8}
# divide plotting area
m <- rbind(c(1, 2), c(3, 0))
graphics::layout(m)
diff --git a/14-ch14.Rmd b/14-ch14.Rmd
index f1306ad6..4b0d35cf 100644
--- a/14-ch14.Rmd
+++ b/14-ch14.Rmd
@@ -276,7 +276,7 @@ NYSESW <- xts(Delt(NYSESW))
```
-```{r, fig.align='center'}
+```{r, fig.align='center',fig.height=6}
# divide plotting area into 2x2 matrix
par(mfrow = c(2, 2))
diff --git a/index.Rmd b/index.Rmd
index d90679c6..77a3aee0 100755
--- a/index.Rmd
+++ b/index.Rmd
@@ -149,7 +149,7 @@ and click on the button labeled *Run* in the top right corner of the editor. By
#### Vectors {-}
-`r ttcode("R")` is of course more sophisticated than that. We can work with variables or, more generally, objects. Objects are defined by using the assignment operator `r ttcode("<-")`. To create a variable named `r ttcode("x")` which contains the value `r ttcode("10")` type `x <- 10` and click the button *Run* yet again. The new variable should have appeared in the environment pane on the top right. The console however did not show any results, because our line of code did not contain any call that creates output. When you now type `x` in the console and hit return, you ask `r ttcode("R")` to show you the value of `r ttcode("x")` and the corresponding value should be printed in the console.
+In `r ttcode("R")`, you can work with variables or more generally, objects. To define an object, you use the assignment operator `r ttcode("<-")`, for example, to create a variable named `r ttcode("x")` with the value `r ttcode("10")`, you can type `x <- 10` and then click the *Run* button. The new variable should appear in the environment pane on the top right. However, the console won't display any results because this line of code doesn't produce any visible output. If you want to see the value of `x`, you can simply type `x` in the console and press *Enter*, and `r ttcode("R")` will display the corresponding value in the console.
`r ttcode("x")` is a scalar, a vector of length $1$. You can easily create longer vectors by using the function `r ttcode("c()")` (*c* is for "concatenate" or "combine"). To create a vector `r ttcode("y")` containing the numbers $1$ to $5$ and print it, do the following.