Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BDS/examples/stocks.R - Dropped Records #21

Open
joshualeond opened this issue Jan 1, 2022 · 0 comments
Open

BDS/examples/stocks.R - Dropped Records #21

joshualeond opened this issue Jan 1, 2022 · 0 comments

Comments

@joshualeond
Copy link

First off, thanks for your work and I'm excited for the second edition!

I have been reproducing some examples in the book with the Julia language and came across something that threw me for a loop in the introduction. This is the first time I'd seen the response variable as a matrix with the lm function. When the regression is done with a matrix as the response variable the lm documentation notes:

If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix.

This all made sense but I was getting different coefficients than in the stocks.R script provided in this repo. It turns out that lm will drop records in the response matrix if any of the variables have missing values. Since the GOOGL ticker has missing values from 2010-01-01 until 2014-03-01 it drops those records for all the other tickers as well before fitting the models.

So the original plot of coefficients was this:
2022-01-01_some-data

If all the available data is used then the plot becomes:
2022-01-01_all-data

This changes the interpretation of the plot slightly for most stocks but Facebook (FB) is likely notable.

Here's the code I used to create the plots:

library(tidyverse)

# get all stocks data 
url_stocks <- "https://raw.githubusercontent.com/TaddyLab/BDS/master/examples/stocks.csv"

stocks <- read.csv(url_stocks)
stocks$RET <- as.numeric(as.character(stocks$RET))
stocks$date <- as.Date(as.character(stocks$date), format="%Y%m%d")
stocks <- stocks %>% filter(TICKER!="" & RET!="")
dups <- which(duplicated(stocks[,c("TICKER","date")]))
stocks <- stocks[-dups,]

stocks$month <- paste(format(stocks$date, "%Y-%m"),"-01",sep="")
stocks$month <- as.Date(stocks$month)

agg <- function(r) prod(1+r, na.rm=TRUE) - 1
mnthly <- stocks %>%
  group_by(TICKER, month) %>%
  summarize(RET = agg(RET), SNP = agg(sprtrn))

RET <- as.data.frame(mnthly[,-4]) %>% spread(TICKER, RET)
SNP <- as.data.frame(mnthly[,c("month","SNP")])
SNP <- SNP[match(unique(SNP$month),SNP$month),]

RET <- RET %>% select(-MPET)

# get three-month U.S. treasury bills data
url_tbill <- "https://raw.githubusercontent.com/TaddyLab/BDS/master/examples/tbills.csv"
tbills <- read.csv(url_tbill)
tbills$date <- as.Date(tbills$date)

# get big company market cap data
url_bigs <- "https://raw.githubusercontent.com/TaddyLab/BDS/master/examples/bigstocks.csv"
bigs <- read.csv(url_bigs, header = FALSE, as.is = TRUE)
exr <- (as.matrix(RET[,bigs[,1]]) - tbills[,2])
mkt <- (SNP[,2] - tbills[,2])

# regression models from book
capm <- lm(exr ~ mkt)
(ab <- t(coef(capm))[,2:1])

ab <- ab[-9,]

par(mai=c(.8,.8,0,0), xpd=FALSE)
plot(ab, type="n", bty="n", xlab="beta", ylab="alpha")
abline(v=1, lty=2, col=8)
abline(h=0, lty=2, col=8)
text(ab, labels=rownames(ab), cex=bigs[,2]/350, col="navy")

# create regression per variable
exrdf <- as.data.frame(exr)
exrdf <- mutate(exrdf, mkt = mkt)

allmods <- exrdf %>% 
  pivot_longer(-mkt, names_to = "ticker", values_to = "exr") %>% 
  group_by(ticker) %>% 
  nest() %>% 
  mutate(
    regmods = map(data, ~ lm(exr ~ mkt, data = .)),
    coefs = map(regmods, broom::tidy)
    ) %>%
  unnest(coefs) %>% 
  select(ticker, term, estimate) %>% 
  pivot_wider(names_from = term, values_from = estimate) %>% 
  filter(ticker != "WMT") %>% 
  ungroup() %>% 
  select(ticker, mkt, `(Intercept)`) %>% 
  column_to_rownames("ticker")

# plot new results
par(mai=c(.8,.8,0,0), xpd=FALSE)
plot(allmods, type="n", bty="n", xlab="beta", ylab="alpha")
abline(v=1, lty=2, col=8)
abline(h=0, lty=2, col=8)
text(allmods, labels=rownames(allmods), cex=bigs[,2]/350, col="navy") 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant