Skip to content

Commit

Permalink
Update to StatsModels 0.7 (#220)
Browse files Browse the repository at this point in the history
* Update FixedEffectModels.jl

* Update FixedEffectModels.jl

* Update FixedEffectModels.jl

* Update Project.toml

* update

* Update Project.toml

* Update FixedEffectModels.jl

* update benchmarks

* Update partial_out.jl

* Update README.md

* update stata too

* Update README.md

* Update README.md

* Update Project.toml

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* better printing

* update version to 1.9.0

* update tests

* Update FixedEffectModel.jl

* use snoopcompile

* precompile

* Update FixedEffectModel.jl

* Update FixedEffectModel.jl

* Update runtests.jl

* Update formula.jl

* Update fit.jl

* update to Julia 1.6
  • Loading branch information
matthieugomez authored Mar 14, 2023
1 parent cfdef87 commit 18e17d3
Show file tree
Hide file tree
Showing 21 changed files with 1,237 additions and 1,257 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
fail-fast: false
matrix:
version:
- '1.3' # Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.
- '1.6' # Replace this with the minimum Julia version that your package supports. E.g. if your package requires Julia 1.5 or higher, change this to '1.5'.
- '1' # Leave this line unchanged. '1' will automatically expand to the latest stable 1.x release of Julia.
os:
- ubuntu-latest
Expand Down
12 changes: 7 additions & 5 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "FixedEffectModels"
uuid = "9d5cd8c9-2029-5cab-9928-427838db53e3"
version = "1.8.1"
version = "1.9.0"

[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Expand All @@ -15,15 +15,17 @@ StatsFuns = "4c63d2b9-4356-54db-8cca-17b64c39e42c"
StatsModels = "3eaba693-59b7-5ba5-a881-562e759f1c8d"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
Vcov = "ec2bfdc2-55df-4fc9-b9ae-4958c2cf2486"
SnoopPrecompile = "66db9d55-30c0-4569-8b51-7e840670fc0c"

[compat]
DataFrames = "0.21, 0.22, 1.0"
DataFrames = "0.21, 0.22, 1"
FixedEffects = "2"
Reexport = "0.1, 0.2, 1.0"
Reexport = "0.1, 0.2, 1"
SnoopPrecompile = "1"
StatsAPI = "1"
StatsBase = "0.33"
StatsFuns = "0.9, 1"
StatsModels = "0.6"
StatsModels = "0.7"
Tables = "1"
Vcov = "0.7"
julia = "1.3"
julia = "1.6"
15 changes: 6 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,10 @@ This package estimates linear models with high dimensional categorical variables
The package is registered in the [`General`](https://github.com/JuliaRegistries/General) registry and so can be installed at the REPL with `] add FixedEffectModels`.

## Benchmarks
The objective of the package is similar to the Stata command [`reghdfe`](https://github.com/sergiocorreia/reghdfe) and the R function [`felm`](https://cran.r-project.org/web/packages/lfe/lfe.pdf). The package tends to be much faster than these two options.

![benchmark](http://www.matthieugomez.com/files/fixedeffectmodels_benchmark.png)
The objective of the package is similar to the Stata command [`reghdfe`](https://github.com/sergiocorreia/reghdfe) and the R packages [`lfe`](https://cran.r-project.org/web/packages/lfe/lfe.pdf) and [`fixest`](https://lrberge.github.io/fixest/). The package is much faster than `reghdfe` or `lfe`. It also tends to be a bit faster than the more recent `fixest` (depending on the exact command). For complicated models, `FixedEffectModels` can also run on Nvidia GPUs for even faster performances (see below)


Performances are roughly similar to the newer R function [`feols`](https://cran.r-project.org/web/packages/fixest/fixest.pdf). The main difference is that `FixedEffectModels` can also run the demeaning operation on a GPU (with `method = :gpu`).
![benchmark](http://www.matthieugomez.com/files/fixedeffectmodels_benchmark.png)

## Syntax

Expand Down Expand Up @@ -99,14 +97,13 @@ You may use [RegressionTables.jl](https://github.com/jmboehm/RegressionTables.jl

## Performances


### MultiThreads
`FixedEffectModels` is multi-threaded. Use the option `nthreads` to select the number of threads to use in the estimation (defaults to `Threads.nthreads()`). That being said, multithreading does not usually make a big difference.
`FixedEffectModels` is multi-threaded. Use the option `nthreads` to select the number of threads to use in the estimation (defaults to `Threads.nthreads()`).

### GPU
The package has support for GPUs (Nvidia) (thanks to Paul Schrimpf). This can make the package an order of magnitude faster for complicated problems.
### Nvidia GPU
The package has support for Nvidia GPUs (thanks to Paul Schrimpf). This can make the package an order of magnitude faster for complicated problems.

To use GPU, run `using CUDA` before `using FixedEffectModels`. Then, estimate a model with `method = :gpu`. For maximum speed, set the floating point precision to `Float32` with `double_precision = false`.
If you have a Nvidia GPU, run `using CUDA` before `using FixedEffectModels`. Then, estimate a model with `method = :gpu`. For maximum speed, set the floating point precision to `Float32` with `double_precision = false`.

```julia
using CUDA, FixedEffectModels
Expand Down
10 changes: 0 additions & 10 deletions benchmark/.sublime2Terminal.jl

This file was deleted.

2 changes: 1 addition & 1 deletion benchmark/benchmark.csv
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Order,Command,Julia,R,Stata1,simple,0.601445,1.843,1.22,1 hd fe,1.624446 ,14.831,15.513,2 hd fe,3.639817,10.626,49.384, 1 cluster,1.462648,9.255,11.155, 2 cluster,7.187382,96.958,118.67
Order,Command,FixedEffectModels.jl (Julia),fixest (R),lfe (R),reghdfe (Stata)1,simple,0.35,0.317,1.843, 0.612,1 hd fe,0.463 ,0.704 ,14.831, 4.643,2 hd fe,1.00,1.297 ,10.626, 22.994, 1 cluster se,0.38058,0.700 ,9.255, 8.285, 2 clusters se,0.765,1.803,96.958, 70.44
Expand Down
27 changes: 15 additions & 12 deletions benchmark/benchmark.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
using DataFrames, FixedEffectModels, Random, CategoricalArrays

using DataFrames, Random, CategoricalArrays
@time using FixedEffectModels
# 13s precompiling
# Very simple setup
N = 10000000
K = 100
Expand All @@ -11,17 +12,19 @@ y= 3 .* x1 .+ 5 .* x2 .+ cos.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = id1, id2 = id2, x1 = x1, x2 = x2, y = y)
# first time
@time reg(df, @formula(y ~ x1 + x2))
# 14s
# 3.5s
@time reg(df, @formula(y ~ x1 + x2))
# 0.582029 seconds (852 allocations: 535.311 MiB, 18.28% gc time)
# 0.497374 seconds (450 allocations: 691.441 MiB, 33.18% gc time)
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id2))
# 1.898018 seconds (7.10 M allocations: 1.220 GiB, 8.20% gc time, 4.46% compilation time)
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id2))
# 0.621690 seconds (693 allocations: 768.945 MiB, 7.69% gc time)
# 0.605172 seconds (591 allocations: 768.939 MiB, 42.38% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)))
# 1.143941 seconds (245.39 k allocations: 942.937 MiB, 12.93% gc time, 14.99% compilation time)
# 0.893835 seconds (1.03 k allocations: 929.130 MiB, 54.19% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)), Vcov.cluster(:id1))
# 1.242207 seconds (245.73 k allocations: 1022.348 MiB, 9.48% gc time, 14.10% compilation time)
# 1.015078 seconds (1.18 k allocations: 1008.532 MiB, 56.50% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
# 2.255812 seconds (351.74 k allocations: 1.076 GiB, 3.98% gc time, 12.93% compilation time)
# 1.835464 seconds (4.02 k allocations: 1.057 GiB, 35.59% gc time)

# More complicated setup
N = 800000 # number of observations
Expand All @@ -34,7 +37,7 @@ x2 = cos.(id1) + sin.(id2) + randn(N)
y= 3 .* x1 .+ 5 .* x2 .+ cos.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = id1, id2 = id2, x1 = x1, x2 = x2, y = y)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
# 3.048292 seconds (422.51 k allocations: 114.317 MiB, 6.86% compilation time)
# 2.504294 seconds (75.83 k allocations: 95.525 MiB, 0.23% gc time)


+# fixest
Expand All @@ -48,8 +51,8 @@ X1 = rand(n)
ln_y = 3 .* X1 .+ rand(n)
df = DataFrame(X1 = X1, ln_y = ln_y, id1 = id1, id2 = id2, id3 = id3)
@time reg(df, @formula(ln_y ~ X1 + fe(id1)), Vcov.cluster(:id1))
# 0.869512 seconds (234.23 k allocations: 828.818 MiB, 18.95% compilation time)
# 0.543996 seconds (873 allocations: 815.677 MiB, 34.15% gc time)
@time reg(df, @formula(ln_y ~ X1 + fe(id1) + fe(id2)), Vcov.cluster(:id1))
# 2.192262 seconds (300.08 k allocations: 985.534 MiB, 4.61% gc time, 9.42% compilation time)
# 1.301908 seconds (3.03 k allocations: 968.729 MiB, 25.84% gc time)
@time reg(df, @formula(ln_y ~ X1 + fe(id1) + fe(id2) + fe(id3)), Vcov.cluster(:id1))
# 2.700051 seconds (406.80 k allocations: 1.117 GiB, 3.56% gc time, 10.41% compilation time)
# 1.658832 seconds (4.17 k allocations: 1.095 GiB, 29.78% gc time)
76 changes: 53 additions & 23 deletions benchmark/benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,61 @@

Code to reproduce this graph:

Julia
FixedEffectModels.jl v1.9.0 (Julia 1.9)
```julia
using DataFrames, FixedEffectModels
using DataFrames, CategoricalArrays, FixedEffectModels
N = 10000000
K = 100
id1 = rand(1:(N/K), N)
id2 = rand(1:K, N)
x1 = randn(N)
x2 = randn(N)
y= 3 .* x1 .+ 2 .* x2 .+ sin.(id1) .+ cos.(id2).^2 .+ randn(N)
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, w = w, y = y)
df = DataFrame(id1 = categorical(id1), id2 = categorical(id2), x1 = x1, x2 = x2, y = y)
@time reg(df, @formula(y ~ x1 + x2))
#0.601445 seconds (1.05 k allocations: 535.311 MiB, 31.95% gc time)
# 0.338749 seconds (450 allocations: 691.441 MiB, 2.30% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1)))
# 1.624446 seconds (1.21 k allocations: 734.353 MiB, 17.27% gc time)
# 0.463058 seconds (1.00 k allocations: 929.129 MiB, 13.31% gc time)
@time reg(df, @formula(y ~ x1 + x2 + fe(id1) + fe(id2)))
# 3.639817 seconds (1.84 k allocations: 999.675 MiB, 11.25% gc time)
# 1.006031 seconds (3.22 k allocations: 1.057 GiB, 1.68% gc time)
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id1))
# 1.462648 seconds (499.30 k allocations: 690.102 MiB, 15.92% gc time)
@time reg(df, @formula(y ~ x1 + x2, Vcov.cluster(:id1, :id2)))
# 7.187382 seconds (7.02 M allocations: 2.753 GiB, 24.19% gc time)
# 0.380562 seconds (580 allocations: 771.606 MiB, 3.07% gc time)
@time reg(df, @formula(y ~ x1 + x2), Vcov.cluster(:id1, :id2))
#0.765847 seconds (719 allocations: 1.128 GiB, 2.01% gc time)
````


R (lfe package)
fixest v0.8.4 (R 4.2.2)
```R
library(fixest)
N = 10000000
K = 100
df = data.frame(
id1 = as.factor(sample(N/K, N, replace = TRUE)),
id2 = as.factor(sample(K, N, replace = TRUE)),
x1 = runif(N),
x2 = runif(N)
)
df[, "y"] = 3 * df[, "x1"] + 2 * df[, "x2"] + sin(as.numeric(df[, "id1"])) + cos(as.numeric(df[, "id2"])) + runif(N)
system.time(feols(y ~ x1 + x2, df))
#> user system elapsed
#> 0.280 0.036 0.317
system.time(feols(y ~ x1 + x2|id1, df))
#> user system elapsed
#> 0.616 0.089 0.704
system.time(feols(y ~ x1 + x2|id1 + id2, df))
#> user system elapsed
#> 1.181 0.120 1.297
system.time(feols(y ~ x1 + x2, cluster = "id1", df))
#> user system elapsed
#> 0.630 0.071 0.700
system.time(feols(y ~ x1 + x2, cluster = c("id1", "id2"), df))
#> user system elapsed
#> 1.570 0.197 1.803
```


lfe v2.8-8 (R 4.2.2)
```R
library(lfe)
N = 10000000
Expand All @@ -42,22 +72,22 @@ Code to reproduce this graph:
system.time(felm(y ~ x1 + x2, df))
#> user system elapsed
#> 1.843 0.476 2.323
#> 1.137 0.232 1.596
system.time(felm(y ~ x1 + x2|id1, df))
#> user system elapsed
#> 14.831 1.342 15.993
#> 7.08 0.41 7.46
system.time(felm(y ~ x1 + x2|id1 + id2, df))
#> user system elapsed
#> 10.626 1.358 10.336
#> 4.832 0.370 4.615
system.time(felm(y ~ x1 + x2|0|0|id1, df))
#> user system elapsed
#> 9.255 0.843 10.110
#> 3.712 0.287 3.996
system.time(felm(y ~ x1 + x2|0|0|id1 + id2, df))
#> user system elapsed
#> 96.958 1.474 99.113
```
#> 59.119 0.889 59.946
Stata (reghdfe version 5.2.9 06aug2018)
reghdfe version 5.6.8 03mar2019 (Stata 16.1)
```
clear all
local N = 10000000
Expand All @@ -72,13 +102,13 @@ Code to reproduce this graph:
set rmsg on
reg y x1 x2
#> r; t=1.20
areg y x1 x2, a(id1)
#>r; t=15.51
#> r; t=0.61
reghdfe y x1 x2, a(id1)
#>r; t=4.64
reghdfe y x1 x2, a(id1 id2)
#> r; t=49.38
#> r; t==22.99
reg y x1 x2, cl(id1)
#> r; t=11.15
#> r; t=8.28
ivreg2 y x1 x2, cluster(id1 id2)
#> r; t=118.67
#> r; t=70.44
````
Binary file added benchmark/fixedeffectmodels_benchmark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion benchmark/result.jl
Original file line number Diff line number Diff line change
@@ -1 +1 @@
using DataFrames, CSV, Gadflydf = CSV.read("/Users/Matthieu/Dropbox/Github/FixedEffectModels.jl/benchmark/benchmark.csv")df.R = df.R ./ df.Juliadf.Stata = df.Stata ./ df.Juliadf.Julia = df.Julia ./ df.Juliamdf = melt(df[!, [:Command, :Julia, :R, :Stata]], :Command)mdf = rename(mdf, :variable => :Language)p = plot(mdf, x = "Language", y = "value", color = "Command", Guide.ylabel("Time (Ratio to Julia)"), Guide.xlabel("Model"), Guide.yticks(ticks= [1, 5, 10, 15]))draw(PNG("/Users/Matthieu/Dropbox/Github/FixedEffectModels.jl/benchmark/fixedeffectmodels_benchmark.png", 8inch, 5inch, dpi=300), p)
using DataFrames, CSV, Gadflydf = CSV.read("/Users/matthieugomez/Dropbox/Github/FixedEffectModels.jl/benchmark/benchmark.csv", DataFrame)df."fixest (R)" = df."fixest (R)" ./ df."FixedEffectModels.jl (Julia)"df."lfe (R)" = df."lfe (R)" ./ df."FixedEffectModels.jl (Julia)"df."reghdfe (Stata)" = df."reghdfe (Stata)" ./ df."FixedEffectModels.jl (Julia)"df."FixedEffectModels.jl (Julia)" = df."FixedEffectModels.jl (Julia)" ./ df."FixedEffectModels.jl (Julia)"mdf = stack(df, Not([:Command, :Order]))mdf = rename(mdf, :variable => :Language)p = plot(mdf, x = "Command", y = "value", color = "Language", Guide.ylabel("Time (Ratio to Julia)"), Guide.xlabel("Command"), Scale.y_log10)draw(PNG("/Users/matthieugomez/Dropbox/Github/FixedEffectModels.jl/benchmark/fixedeffectmodels_benchmark.png", 8inch, 5inch, dpi=300), p)
Expand Down
Binary file removed benchmark/result.png
Binary file not shown.
Loading

2 comments on commit 18e17d3

@matthieugomez
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/79632

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v1.9.0 -m "<description of version>" 18e17d31eaea9c0a36409fec08b0a313055e421e
git push origin v1.9.0

Please sign in to comment.