forked from englianhu/binary.com-interview-question
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbinary-Q1L-EN-R.Rmd
256 lines (205 loc) · 11 KB
/
binary-Q1L-EN-R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
---
title: "<img src='www/binary-logo-resize.jpg' width='240'>"
subtitle: "[binary.com](https://github.com/englianhu/binary.com-interview-question) Interview Question I - Lasso, Elastic-Net and Ridge Regression"
author: "[®γσ, Lian Hu](https://englianhu.github.io/) <img src='www/RYO.jpg' width='24'> <img src='www/RYU.jpg' width='24'> <img src='www/ENG.jpg' width='24'>®"
date: "`r lubridate::today('Asia/Tokyo')`"
output:
html_document:
number_sections: yes
toc: yes
toc_depth: 4
toc_float:
collapsed: yes
smooth_scroll: yes
code_folding: hide
---
```{r warning=FALSE}
suppressPackageStartupMessages(library('BBmisc'))
#'@ suppressPackageStartupMessages(library('rmsfuns'))
pkgs <- c('knitr', 'kableExtra', 'tint', 'devtools', 'lubridate', 'data.table', 'feather', 'quantmod', 'tidyquant', 'tidyr', 'timetk', 'plyr', 'stringr', 'magrittr', 'dplyr', 'tidyverse', 'memoise', 'htmltools', 'formattable', 'zoo', 'forecast', 'glmnet')
suppressAll(lib(pkgs))
#'@ load_pkg(pkgs)
funs <- c('glmPrice.R')
l_ply(funs, function(x) source(paste0('./function/', x)))
.cl = FALSE
options(warn = -1)#, digits.secs = 6)
rm(pkgs, funs)
```
# Introduction
Due to I don't know if my initial models in these studies are effective or not. Today, I am trying to recall the `Lasso`, `Elastic-Net` and also `Ridge` models and compare with ARIMA model. That was the initial models I built in order to predict the stock price of [`LAD`](https://finance.yahoo.com/quote/LAD/) since few years ago.
[Glmnet Vignette](https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html) provides examples for statistical modelling while [Stock Prediction with R glmnet and tm packages](https://www.kaggle.com/captcalculator/stock-prediction-with-r-glmnet-and-tm-packages) applied `Lasso` regression for stock price prediction.
- [热门数据挖掘模型应用入门(一): LASSO 回归](https://cosx.org/2016/10/data-mining-1-lasso/)
![*Source :* [*shinyApp*](https://beta.rstudioconnect.com/content/2367/)](www/20170113_104005.gif)
You are feel free to browse over [*shinyApp*](https://beta.rstudioconnect.com/content/2367/).
# Data
```{r warning=FALSE}
mbase <- readRDS('data/LAD_Full.rds') %>%
tk_tbl
mbase %>%
kable(caption = 'OHLCVA of LAD') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
scroll_box(width = '100%', height = '400px')
```
# Modelling
## Lasso, Elastic-Net and Ridge
Below I tried to model Lasso, Elastic-Net and also Ridge models.
```{r warning=FALSE, eval=FALSE}
# ----------- eval=FALSE --------------
#'@ pr <- glmPrice(mbase, fordate = xts::last(mbase$Date)+days(1))
timeID <- mbase$index
timeID0 <- timeID %>% xts::first() %m+% years(1)
timeID %<>% .[. > timeID0]
fit <- list()
prd <- list()
prc <- vector('list', length(timeID))
for(d in (1:length(timeID))) {
for(i in 0:10) {
smp <- mbase %>% dplyr::filter(index <= timeID[d])
x <- as.matrix(smp[c('LAD.Open', 'LAD.High', 'LAD.Low')])
y <- as.matrix(smp['LAD.Close'])
#'@ y <- as.matrix(lead(smp$LAD.Close))
fit[[i+1]] <- cv.glmnet(x = x, y = y, alpha = i/10,
type.measure = 'mse', maxit = 100)
prd[[i+1]] <- predict(fit[[i+1]], newx = tail(x, 1), n.ahead = 1,
type='link', s='lambda.1se')
}
prc[[d]] <- prd %>% cbind %>%
data.frame(index = tail(smp$index, 1), alpha = 0:10) %>%
.[, 1:3] %>%
tbl_df %>%
unnest %>%
dplyr::rename(Price = '.')
}
```
```{r warning=FALSE}
dr <- dir('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)', pattern = '[0-9]{8}')
md <- llply(dr, function(x) {
pth <- paste0('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)/', x, '/fitgaum.best.rds')
readRDS(pth) %>%
head(1)
})
names(md) <- dr
#'@ md$`20160201`[[1]]$yhat[[1]] %>% tail
```
## Auto ARIMA
[Transforming subsets of data in R with by, ddply and data.table](https://magesblog.com/post/2012-06-12-transforming-subsets-of-data-in-r-with/) compares the efficiency of `transform` dataset.
Here I try to use `auto.arima` to compare with the models states at previous section.
```{r warning=FALSE, message=FALSE}
tmID <- mbase$index
tmID %<>% .[. > ymd('2015-01-02')]
at.ar <- llply(tmID, function(dt) {
smp <- mbase[c('index', 'LAD.Close')] %>%
dplyr::filter((index < dt) & (index >= dt %m-% years(1)))
dat <- smp %>%
tk_xts %>%
auto.arima %>%
forecast(h=1)
data.frame(index = xts::last(smp$index), Price.T1 = dat)
}) %>%
bind_rows %>%
tbl_df
```
# MSE
Here I compare the Lasso/Elastic-net/Ridge models with Auto.Arima model.
```{r warning=FALSE}
dr <- dir('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)', pattern = '[0-9]{8}')
comp <- llply(dr, function(x) {
pth <- paste0('C:/Users/scibr/Documents/GitHub/englianhu/data (binary.com-interview-question excludes fx subfolder)/', x, '/fitgaum.mse1.rds')
readRDS(pth) %>%
head(1) %>%
dplyr::rename(Submodel = model, Model = .id)
})
names(comp) <- dr
comp %<>%
ldply %>%
tbl_df %>%
dplyr::rename(index = .id, MSE.1 = mse) %>%
mutate(index = ymd(index),
Model = factor(Model),
Submodel = factor(Submodel))
comp %>%
kable(caption = 'MSE of daily Opened and Closed Transaction Orders') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive')) %>%
scroll_box(width = '100%', height = '400px')
comp2 <- join_all(list(mbase[c('index', 'LAD.Close')],
at.ar[c('index', 'Price.T1.Point.Forecast')],
comp)) %>%
na.omit %>%
tbl_df
## MSE
comp2 %>% mutate(
MSE.AR = mean((Price.T1.Point.Forecast - LAD.Close)^2),
MSE.Lasso = mean(MSE.1)) %>%
dplyr::select(MSE.AR, MSE.Lasso) %>%
unique %>%
kable(caption = 'MSE of daily Opened and Closed Transaction Orders') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive'))
```
# Conclusion
From above table, we know that `r comp2 %>% mutate(
MSE.AR = mean((Price.T1.Point.Forecast - LAD.Close)^2),
MSE.Lasso = mean(MSE.1)) %>%
dplyr::select(MSE.AR, MSE.Lasso) %>%
unique %>% unlist %>% which.min %>% names` model is better model.
```{r option, echo = FALSE}
## Set options back to original options
options(warn = 0)
```
# Appendix
## Documenting File Creation
It's useful to record some information about how your file was created.
- File creation date: 2018-10-20
- File latest updated date: `r today('Asia/Tokyo')`
- `r R.version.string`
- R version (short form): `r getRversion()`
- [**rmarkdown** package](https://github.com/rstudio/rmarkdown) version: `r packageVersion('rmarkdown')`
- File version: 1.0.1
- Author Profile: [®γσ, Eng Lian Hu](https://beta.rstudioconnect.com/content/4352/)
- GitHub: [Source Code](https://github.com/englianhu/binary.com-interview-question)
- Additional session information:
```{r info, echo=FALSE, warning=FALSE, results='asis'}
suppressMessages(require('dplyr', quietly = TRUE))
suppressMessages(require('formattable', quietly = TRUE))
suppressMessages(require('knitr', quietly = TRUE))
suppressMessages(require('kableExtra', quietly = TRUE))
sys1 <- devtools::session_info()$platform %>%
unlist %>% data.frame(Category = names(.), session_info = .)
rownames(sys1) <- NULL
#sys1 %<>% rbind(., data.frame(
# Category = 'Current time',
# session_info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST'))) %>%
# dplyr::filter(Category != 'os')
sys2 <- data.frame(Sys.info()) %>% mutate(Category = rownames(.)) %>% .[2:1]
names(sys2)[2] <- c('Sys.info')
rownames(sys2) <- NULL
if (nrow(sys1) == 7 & nrow(sys2) == 8) {
sys1 %<>% rbind(., data.frame(
Category = 'Current time',
session_info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST')))
} else {
sys2 %<>% rbind(., data.frame(
Category = 'Current time',
Sys.info = paste(as.character(lubridate::now('Asia/Tokyo')), 'JST')))
}
cbind(sys1, sys2) %>%
kable(caption = 'Additional session information:') %>%
kable_styling(bootstrap_options = c('striped', 'hover', 'condensed', 'responsive'))
rm(sys1, sys2)
```
## Reference
01. [**Stock Market Forecasting Using LASSO Linear Regression Model** *by Sanjiban Sekhar Roy, Dishant Mital, Avik Basu, Ajith Abraham (2015)*](https://raw.githubusercontent.com/englianhu/binary.com-interview-question/master/reference/Stock%20Market%20Forecasting%20Using%20LASSO%20Linear%20Regression%20Model.pdf)<img src='www/hot.jpg' width='20'>
02. [**Using LASSO from lars (or glmnet) package in R for variable selection** *by Juancentro (2014)*](http://stats.stackexchange.com/questions/58531/using-lasso-from-lars-or-glmnet-package-in-r-for-variable-selection?answertab=votes#tab-top)
03. [**Difference between glmnet() and cv.glmnet() in R?** *by Amrita Sawant (2015)*](https://stackoverflow.com/questions/29311323/difference-between-glmnet-and-cv-glmnet-in-r?answertab=votes#tab-top)
04. [**Testing Kelly Criterion and Optimal f in R** *by Roy Wei (2012)*](https://alphaism.wordpress.com/2012/04/13/testing-kelly-criterion-and-optimal-f-in-r) <img src='www/hot.jpg' width='20'>
05. [**Portfolio Optimization and Monte Carlo Simulation** *by Magnus Erik Hvass Pedersen (2014)*](https://raw.githubusercontent.com/scibrokes/kelly-criterion/master/references/Portfolio%20Optimization%20and%20Monte%20Carlo%20Simulation.pdf) <img src='www/hot.jpg' width='20'>
06. [**Glmnet Vignette** *by Trevor Hastie and Junyang Qian (2014)*](https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html)
07. [**lasso怎么用算法实现?** *by shuaihuang (2010)*](https://d.cosx.org/d/101533-101533/5)
08. [**The Sparse Matrix and {glmnet}** *by Manuel Amunategui (2014)*](http://amunategui.github.io/sparse-matrix-glmnet/)
09. [**Regularization and Variable Selection via the Elastic Net** *by Hui Zou and Trevor Hastie*](https://raw.githubusercontent.com/englianhu/binary.com-interview-question/master/reference/Regularization%20and%20Variable%20Selection%20via%20the%20Elastic%20Net.pdf)
10. [LASSO, Ridge, and Elastic Net](http://www4.stat.ncsu.edu/~post/josh/LASSO_Ridge_Elastic_Net_-_Examples.html) <img src='www/hot.jpg' width='20'>
11. [**热门数据挖掘模型应用入门(一): LASSO回归** *by 侯澄钧 (2016)*](https://cosx.org/2016/10/data-mining-1-lasso)
12. [The Lasso Page](http://statweb.stanford.edu/~tibs/lasso.html)
13. [Shrinkage Estimation of Linear Regression Models with GARCH Errors](Shrinkag%20Estimation%20of%20Linear%20Regression%20Models%20with%20GARCH%20Errors.pdf)
14. [Iteratively Reweighted Adaptive Lasso for Conditional Heteroscedastic Time Series with Applications to AR-ARCH Type Processes](Iteratively%20Reweighted%20Adaptive%20Lasso%20for%20Conditional%20Heteroscedastic%20Time%20Series with%20Applications%20to%20AR-ARCH%20Type%20Processes.pdf)
---
<span style='color:RoyalBlue'>**Powered by - Copyright® Intellectual Property Rights of [<img src='www/scb_logo.jpg' width='64'>®](http://www.scibrokes.com)個人の経営企業**</span>