-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLaw of Large Numbers.Rmd
104 lines (93 loc) · 4.6 KB
/
Law of Large Numbers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "Homework 1: Law of Large Numbers"
author: "Jiawen Qi (jiq10)"
date: "Wednesday, September 14, 2016"
output: word_document
---
To show the effect of sample size, we'll show that the larger the sample the more representative the mean is: if the sample is a random sample from the whole population. If the sample is biased (some items have a higher probability of being included in the sample than other items), then increase the size of the sample does not improve the accuracy of the mean.
```{r}
set.seed(12345) # set the seed so we always get the same numbers
# remove this to get random numbers each time
unif<- runif(1000000) # get one million uniform random numbers, min = 0, max = 1
hist(unif) # show the uniform curve
summary(unif) # some descriptives
```
* First of all, we'll do an unbiased sample with 20, 300, and 4000 observations in the sample.
* The function, sample, has up to 4 parameters:
+ a) the first parameter is the population (unif in our case)
+ b) the second parameter is size (size=number)
+ c) the third parameter is replacement or not (replace=TRUE)
+ d) and the fourth paramter is prob=c() which can change the
probability of selecting a particular value.
* We'll use ubsamp=sample(unif,n,replace=TRUE) meaning
+ a) ubsamp = un-biased sample (just a variable name)
+ b) unif = our population a vector of 1,000,000 uniform random numbers
+ c) replace=TRUE = sample with replacement
```{r}
popmean=mean(unif) # expected to be .5
numSamples=1000 # num of samples - not size of sample
ubsamp=c(1:numSamples) # create the vector (array)
for (n in c(1:3)){ # 3 different sample sizes
for (repeats in c(1:numSamples)){ # numSamples instances of each size
number=(n+1)* (10^n) # size=20, 300, and 4,000
ubsamp[repeats]=mean(sample(unif,size=number,replace=TRUE))
# fill in a vector with 1,000 entries
# each of which is the mean of sample
}
line=paste0("sample size is ", number, " and actual sample mean is " ,
mean(ubsamp))
# edit text and numbers together in
# one line
print (line) # and print it out.
}
```
Now, we'll do the same thing with a biased sample. To make a biased sample, we'll make each number that is between .6 and .7 equal to a missing value.
```{r}
popmean=mean(unif) # expected to be .5, but just close in a random sample
numSamples=1000
ubsamp=c(1:numSamples)
for (n in c(1:3)){ # 3 different sample sizes
for (repeats in c(1:numSamples)){ # numSamples instances of each size
number=(n+1)* (10^n) # size=20, 300, and 4,000
thisSample=c(1:number)
thisSample=sample(unif,size=number,replace=TRUE)
for (thisOne in c(1:number)){
if (thisSample[thisOne] >= 0.6 & thisSample[thisOne] <= 0.7) thisSample[thisOne]=NA
}
ubsamp[repeats]=mean(thisSample,na.rm=TRUE)
# fill in a vector with 1,000 entries
# each of which is the mean of sample
}
line=paste0("sample size is ", number, " and actual sample mean is " ,
(mean(ubsamp)))
# edit text and numbers together in
# one line
print (line) # and print it out.
}
```
Now, let's look at what happens to the standard deviation when the sample size increases. Remember that the population is uniformly distributed with an expected mean of .5 and expected standard deviation of zero.
```{r}
mean(unif)
sd(unif)
```
So, we'll again use sample sizes of 20, 300, and 4000 and instead of looking at the sample mean we'll look at the standard deviation of the mean of the samples.
```{r}
hist(sample(unif,size=1000),col="blue")
for (i in c(1:3)){
number=(10 ^ i) * (i + 1)
samplevector=c(1:1000)
for (j in c(1:1000)){
samplevector[j]=mean(sample(unif,size=number,replace=TRUE))
}
if (i==1) {
hist(samplevector,add=T,col="red")
}
if (i==2) {
hist(samplevector,add=T,col="green")
}
if (i==3) {
hist(samplevector,add=T,col="Black")
}
print (paste0("sample size is ", number, " the sd is ", sd(samplevector), " and the mean is ", mean(samplevector)))
}
```