Skip to content

ksecology/FixedEffectModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FixedEffectModel: A Python Package for Linear Model with High Dimensional Fixed Effects.

image

Downloads Downloads Downloads

FixedEffectModel is a Python Package designed and built by Kuaishou DA ecology group. It is used to estimate the class of linear models which handles panel data. Panel data refers to the type of data when time series and cross-sectional data are combined.

Main Features

  • Linear model
  • Linear model with high dimensional fixed effects
  • Difference-in-difference model with parallel checking plot
  • Instrumental variable model
  • Robust/white standard error
  • Multi-way cluster standard error
  • Instrumental variable model tests, including weak iv test (cragg-dolnald statistics+stock and yogo critical values), over-identification test (sargan/Basmann test), endogeneity test (durbin test)

For instrumental variable model, we now only provide two stage least square estimator and produce second stage regression result. In our next release we will include GMM method and robust standard error based on GMM.

Installation

Install this package directly from PyPI

$ pip install FixedEffectModel

Getting started

This very simple case-study is designed to get you up-and-running quickly with fixedeffectmodel. We will show the steps needed.

Loading modules and functions

After installing statsmodels and its dependencies, we load a few modules and functions:

import numpy as np
import pandas as pd


from fixedeffect.iv import iv2sls, ivgmm, ivtest
from fixedeffect.fe import fixedeffect, did, getfe
from fixedeffect.utils.panel_dgp import gen_data

gen_data is the function we use to simulate data.

Data

We use a simulated dataset with 100 cross-sectional units and 10 time units.

N = 100
T = 10
beta = [-3,1,2,3,4]
ate = 1
exp_date = 5
df = gen_data(N, T, beta, ate, exp_date)

Ihe the above simulated dataset, "beta" are true coefficients, "ate" is the true treatment effect, "exp_date" is the start date of experiment.

Model fit and summary

Instrumental variables estimation

We include two function: "iv2sls" and "iv2gmm" for instrumental variable regression.

iv2sls

This function return two-stage least square estimation results. Define y as the dependent variable, x_1 as exogenous variable, x_2 as endogenous variable, x_3 and x_4 are instrumental variables. id and time are cross sectional id and time id. An IV two-way fixed effect model estimated by two-stage least square is achieved by using:

formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'
model_iv2sls = iv2sls(data_df = df,
                      formula = formula)
result = model_iv2sls.fit()
result.summary()

or

exog_x = ['x_1']
endog_x = ['x_2']
iv = ['x_3','x_4']
y = ['y']

model_iv2sls = iv2sls(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      endog_x = endog_x,
                      category = ['id','time'],
                      iv = iv)

result = model_iv2sls.fit()
result.summary()

The two grammars above yield identical results. We provide specification test for iv models:

ivtest(result1)

Three tests are included: weak iv test (Cragg-Dolnald statistics + Stock and Yogo critical values), over-identification test (Sargan/Basmann test), and endogeneity test (Durbin test).

ivgmm

This function returns one-step gmm estimation result. With same variables definition, estimation is achieved by:

formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'

model_ivgmm = ivgmm(data_df = df,
                    formula = formula)
result = model_ivgmm.fit()
result.summary()

or

exog_x = ['x_1']
endog_x = ['x_2']
iv = ['x_3','x_4']
y = ['y']

model_ivgmm = ivgmm(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      endog_x = endog_x,
                      category = ['id','time'],
                      iv = iv)

result = model_ivgmm.fit()
result.summary()

Fixed Effect Model

This function returns fixed effect model estimation result. Define y as the dependent variable, x_1 as independent variable, id and time are cross sectional ID and time ID. Following code yield estimation of a two-way fixed effect model with two-way cluster standard error:

formula = 'y ~ x_1|id+time|id+time|0'

model_fe = fixedeffect(data_df = df,
                       formula = formula,
                       no_print=True)
result = model_fe.fit()
result.summary()

or

exog_x = ['x_1']
y = ['y']
category = ['id','time']
cluster = ['id','time']


model_fe = fixedeffect(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      category = category,
                      cluster = cluster)

result = model_fe.fit()
result.summary()

Difference in Difference

DID is simply a specific type of fixed effect model. We provide a function of DID to help simplify the estimation process. The regular DID estimation is achieved using following command:

formula = 'y ~ 0|0|0|0'

model_did = did(data_df = df,
                formula = formula,
                treatment = ['treatment'],
                csid = ['id'],
                tsid = ['time'],
                exp_date = 2)
result = model_did.fit()
result.summary()

"exp_date" is the first date that the experiment begins, "treatment" is the column name of the treatment variable. This command estimate the equation below:

We also provide DID with individual effect:

formula = 'y ~ 0|0|0|0'

model_did = did(data_df = df,
                formula = formula,
                treatment = ['treatment'],
                group_effect='individual',
                csid = ['id'],
                tsid = ['time'],
                exp_date = 2)
result = model_did.fit()
result.summary()

This command above estimate the equation below:

Main Functions

Currently there are five main function you can call:

Function name Description Usage
fixedeffect define class for fixed effect estimation fixedeffect (data_df = None, dependent = None, exog_x = None, category = None, cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)
iv2sls define class for 2sls estimation iv2sls (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)
ivgmm define class for gmm estimation ivgmm (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)
did define class for did estimation did (data_df = None, dependent = None, exog_x = None, treatment = None, csid = None, tsid = None, exp_date = None, group_effect = 'treatment', cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)
model.fit fit pre-defined models result = model.fit()
result.summary result.object result.summary()
fit_multi_model fit multiple models models = [model,model_did,model_iv2sls], fit_multi_model (models)
getfe get fixed effects getfe(result)
ivtest get iv post estimation tests results ivtest (result)

fixedeffect

Provide results for a fixed effect model:

model = fixedeffect (data_df = None, dependent = None, exog_x = None, category = None, cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data.
dependent list List object of dependent variables
exog_x list List object of independent variables
category list, default [] List object of category variables, i.e, fixed effect
cluster list, default [] List object of cluster variables, i.e, the cluster level of your standard error
formula string, default None Formula used to parse grammar.
robust bool, default False Whether or not to calculate df-adjusted white standard error (HC1)
noint bool, default True Whether or not generate intercept
c_method str, default 'cgm' Method to calculate multi-cluster standard error. Possible choices are 'cgm' and 'cgm2'.
psdef bool, default True if True, replace negative eigenvalue of variance matrix with 0 (only in multi-way clusters variance)

Return an object of results:

Attribute Type
params Estimated coefficients
df Degree of freedom.
bse standard error
variance_matrix coefficients' variance-covariance matrix

iv2sls/ivgmm

model = iv2sls (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)

model = ivgmm (data_df = None, dependent = None, exog_x = None, endog_x = None, iv = None, category = None, cluster = None, formula = None, robust = False, noint = False)

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data.
dependent list List object of dependent variables
exog_x list List object of exogenous variables
endof_x list List object of endogenous variables
iv list List object of instrumental variables
category list, default [] List object of category variables, i.e, fixed effect
formula string, default None Formula used to parse grammar.
robust bool, default False Whether or not to calculate df-adjusted white standard error (HC1)
noint bool, default True Whether or not generate intercept

Return the same object of results as fixedeffect does.

We also provide two-step GMM estimator if you set thet option "gmm2=True". Define a matrix

  • "ivgmm", the one-step GMM estimator generate with variance-covariance matrices equal

    • Unadjusted. Define , the variance-covariance matrix is
    • Heteroskedasticity robust. Define and , the variance-covariance matrix is
    • Cluster. Deine , the variance-covariance matrix is
  • "ivgmm" with "gmm2=True", the two-step GMM estimator generate

    • Unadjusted.
    • Heteroskedasticity robust. Define and as the diagonal matrix generated using the residual from the two-step GMM. , the variance-covariance matrix is
    • Cluster. Define

    , the variance-covariance matrix is .

DID

model = did (data_df = None, dependent = None, exog_x = None, treatment = None, csid = None, tsid = None, exp_date = None, group_effect = 'treatment', cluster = None, formula = None, robust = False, noint = False, c_method = 'cgm', psdef = True)

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data.
dependent list List object of dependent variables
exog_x list List object of independent variables
treatment list List object of treatment variables
csid list List object of cross sectional id variables
tsid list List object of time variables
exp_date string Experiment start date
group_effect string, default 'treatment' Either equals 'treatment' or 'individual'
cluster list, default [] List object of cluster variables, i.e, the cluster level of your standard error
formula string, default None Formula used to parse grammar.
robust bool, default False Whether or not to calculate df-adjusted white standard error (HC1)
noint bool, default True Whether or not generate intercept
c_method str, default 'cgm' Method to calculate multi-cluster standard error. Possible choices are 'cgm' and 'cgm2'.
psdef bool, default True if True, replace negative eigenvalue of variance matrix with 0 (only in multi-way clusters variance)

Return the same object of results as fixedeffect does.

fit_multi_model

This function is used to get multi results of multi models on one dataframe. During analyzing data with large data size and complicated, we usually have several model assumptions. By using this function, we can easily get the results comparison of the different models.

Input parameters Type Description
data_df pandas dataframe Dataframe with relevant data
models list, default [] List of models
table_header str, default None Title of summary table

Return a summary table of results of the different models.

getfe

This function is used to get fixed effect.

Input parameters Type Description
result object output object of fixedeffect function
epsilon double, default 1e-8 tolerance for projection
normalize bool, default False Whether or not to normalize fixed effects.
category_input list, default [] List of category variables to calculate fixed effect.

Return a summary table of estimates of fixed effects and its standard errors.

ivtest

This function is used to obtain iv test result.

Input parameters Type Description
result object output object of ivgmm/iv2sls function

Return a test result table of iv tests.

Example

# need to install from kuaishou product base
import numpy as np
import pandas as pd
from fixedeffect.iv import iv2sls, ivgmm,ivtest
from fixedeffect.fe import fixedeffect, did,getfe
from fixedeffect.utils.panel_dgp import gen_data 
from fixedeffect.iv import ivtest

N = 100
T = 10
beta = [-3,1,2,3,4]
ate = 1
exp_date = 5

#generate sample data
df = gen_data(N, T, beta, ate, exp_date)

#------------------------------#
#define instrumental variable model
# iv2sls 
formula = 'y ~ x_1|id+time|0|(x_2~x_3+x_4)'
model_iv2sls = iv2sls(data_df = df,
                      formula = formula)
result = model_iv2sls.fit()
result.summary()

# ivgmm 
formula = 'y ~ x_1|id|0|(x_2~x_3+x_4)'

model_ivgmm = ivgmm(data_df = df,
                    formula = formula)
result = model_ivgmm.fit()
result.summary()

# obtain iv test results
ivtest(result)

#------------------------------#

#define fixed effect model
exog_x = ['x_1']
y = ['y']
category = ['id','time']
cluster = ['id','time']


model_fe = fixedeffect(data_df = df,
                      dependent = y,
                      exog_x = exog_x,
                      category = category,
                      cluster = cluster)

result = model_fe.fit()
result.summary()

#obtain fixed effect 
getfe(result)

#------------------------------#
#define DID model
formula = 'y ~ 0|0|0|0'

model_did = did(data_df = df,
                formula = formula,
                treatment = ['treatment'],
                csid = ['id'],
                tsid = ['time'],
                exp_date=2)
result = model_did.fit()
result.summary()

Requirements

  • Python 3.6+
  • Pandas and its dependencies (Numpy, etc.)
  • Scipy and its dependencies
  • statsmodels and its dependencies
  • networkx

Citation

If you use FixedEffectModel in your research, please cite us as follows:

Kuaishou DA Ecology. FixedEffectModel: A Python Package for Linear Model with High Dimensional Fixed Effects.https://github.com/ksecology/FixedEffectModel,2020.Version 0.x

BibTex:

@misc{FixedEffectModel,
  author={Kuaishou DA Ecology},
  title={{FixedEffectModel: {A Python Package for Linear Model with High Dimensional Fixed Effects}},
  howpublished={https://github.com/ksecology/FixedEffectModel},
  note={Version 0.x},
  year={2020}
}

Feedback

This package welcomes feedback. If you have any additional questions or comments, please contact da_ecology@kuaishou.com.

Reference

[1] Simen Gaure(2019). lfe: Linear Group Fixed Effects. R package. version:v2.8-5.1 URL:https://www.rdocumentation.org/packages/lfe/versions/2.8-5.1

[2] A Colin Cameron and Douglas L Miller. A practitioner’s guide to cluster-robust inference. Journal of human resources, 50(2):317–372, 2015.

[3] Simen Gaure. Ols with multiple high dimensional category variables. Computational Statistics & Data Analysis, 66:8–18, 2013.

[4] Douglas L Miller, A Colin Cameron, and Jonah Gelbach. Robust inference with multi-way clustering. Technical report, Working Paper, 2009.

[5] Jeffrey M Wooldridge. Econometric analysis of cross section and panel data. MIT press, 2010.