-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathbartik.Rmd
169 lines (133 loc) · 5.48 KB
/
bartik.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: "How to build a shock using lagged industry shares (Bartik)"
author: "Erik Loualiche"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
#output:
# md_document:
# variant: markdown_github
vignette: >
%\VignetteIndexEntry{Build a shock using lagged industry shares (Bartik)}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
The goal here is to use lagged industry shares at the regional level and aggregate changes in an output variable to generate cross-regional variation.
We will show the example of employment which is easily done with the [*County Business Pattern (CBP)*](./cbp.Rmd)
There are essentially two steps:
1. Downloading and cleaning the data
2. Estimating the shares and aggregate changes to construct the shock.
We are going to use the following libraries
```r
library(data.table)
library(stringr)
library(Hmisc)
library(statar)
library(entrydatar)
```
### 1. Downloading the data
The CBP includes sic code from 1986 to 1997 and naics code from 1998 to 2016. This forces us to break the aggregation of the dataset into two parts.
#### SIC Code level
Then we create a small function that downloads the data for a given year and process it keeping only the variables we are interested:
```r
read_cbp_sic <- function(year_target){
# Download the data from the census at the county level
dt1 <- download_all_cbp(year_target, year_target, aggregation_level = "county")
# clean the data
dt1[, fips := as.numeric(fipstate)*1000 + as.numeric(fipscty) ]
dt1 <- dt1[ !is.na(as.numeric(sic)) ]
# impute employment for each size class
dt1[ empflag == "A", emp := 10 ]
dt1[ empflag == "B", emp := 60 ]
dt1[ empflag == "C", emp := 175 ]
dt1[ empflag == "E", emp := 375 ]
dt1[ empflag == "F", emp := 750 ]
dt1[ empflag == "G", emp := 1750 ]
dt1[ empflag == "H", emp := 3750 ]
dt1[ empflag == "I", emp := 7500 ]
dt1[ empflag == "J", emp := 17500 ]
dt1[ empflag == "K", emp := 37500 ]
dt1[ empflag == "L", emp := 75000 ]
dt1[ empflag == "M", emp := 100000 ]
# aggregate and clean up
dt1[, fips := paste0(fipstate, fipscty) ]
dt1 <- dt1[, .(emp = sum(emp, na.rm = T)), by = list(fips, sic)][ order(sic, fips) ]
dt1[, fipsemp := sum(emp, na.rm = T), by = list(fips) ]
dt1[, date_y := year_target ]
return(dt1)
}
```
Then we download for every years where we have sic codes:
```r
dt_emp_sic <- data.table()
for (year_iter in seq(1986, 1997)){
dt_emp_sic <- rbind(dt_emp_sic, read_cbp_sic(year_iter))
}
dt_emp_sic[]
```
#### NAICS Code level
Then we create a small function that downloads the data for a given year and process it keeping only the variables we are interested:
```r
read_cbp_naics <- function(year_target){
# Download the data from the census at the county level
dt1 <- download_all_cbp(year_target, year_target, aggregation_level = "county")
# clean the data and only keep 4 digits naics codes
dt1[, naics := gsub("\\D", "", naics) ]
dt1 <- dt1[ str_length(naics) == 4 ]
# impute employment for each size class
dt1[ empflag == "A", emp := 10 ]
dt1[ empflag == "B", emp := 60 ]
dt1[ empflag == "C", emp := 175 ]
dt1[ empflag == "E", emp := 375 ]
dt1[ empflag == "F", emp := 750 ]
dt1[ empflag == "G", emp := 1750 ]
dt1[ empflag == "H", emp := 3750 ]
dt1[ empflag == "I", emp := 7500 ]
dt1[ empflag == "J", emp := 17500 ]
dt1[ empflag == "K", emp := 37500 ]
dt1[ empflag == "L", emp := 75000 ]
dt1[ empflag == "M", emp := 100000 ]
# aggregate and clean up
dt1[, fips := paste0(fipstate, fipscty) ]
dt1 <- dt1[, .(emp = sum(emp, na.rm = T)), by = .(fips, naics)][ order(naics, fips) ]
dt1[, fipsemp := sum(emp, na.rm = T), by = list(fips) ]
dt1[, date_y := year_target ]
return(dt1)
}
```
Then we download for every years where we have naics codes:
```r
dt_emp_naics <- data.table()
for (year_iter in seq(1998, 2016)){
dt_emp_naics <- rbind(dt_emp_naics, read_cbp_naics(year_iter))
}
dt_emp_naics[]
```
### Create the cross-regional variation
First we create the shares of employment for a given industry in the region:
```r
dt_emp_sic[, share_ind_cty := emp / fipsemp ]
dt_emp_sic[, l_share_ind_cty := tlag(share_ind_cty, 1, time = date_y), by = .(fips, sic) ]
dt_emp_naics[, share_ind_cty := emp / fipsemp ]
dt_emp_naics[, l_share_ind_cty := tlag(share_ind_cty, 1, time = date_y), by = .(fips, naics) ]
```
Then we create a variable that include employment in all regions except the current one and estimate the growth of employment
```r
dt_emp_sic[, fipsemp_clean := fipsemp - emp ]
dt_emp_sic[, d_fipsemp := log(fipsemp_clean / tlag(fipsemp_clean, 1, time = date_y)), by = .(fips, sic) ]
dt_emp_naics[, fipsemp_clean := fipsemp - emp ]
dt_emp_naics[, d_fipsemp := log(fipsemp_clean / tlag(fipsemp_clean, 1, time = date_y) ), by = .(fips, naics) ]
```
Finally we weight the aggregate change in employment at the industry level by the local industry shares from above:
```r
dt_emp_sic[, .(d_emp = wtd.mean(d_fipsemp, l_share_ind_cty, na.rm = T)), by = .(date_y, fips) ]
dt_emp_naics[, .(d_emp = wtd.mean(d_fipsemp, l_share_ind_cty, na.rm = T)), by = .(date_y, fips) ]
```
To obtain the whole time series we simply append them together
```r
dt_bartik <-
rbind(dt_emp_sic[, .(d_emp = wtd.mean(d_fipsemp, l_share_ind_cty, na.rm = T)), by = .(date_y, fips) ],
dt_emp_naics[, .(d_emp = wtd.mean(d_fipsemp, l_share_ind_cty, na.rm = T)), by = .(date_y, fips) ])
dt_bartik[]
```
---------------------------
(c) Erik Loualiche