Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing excpetion handling for infinite / missing predictions #136

Open
MalteKurz opened this issue Oct 26, 2021 · 2 comments
Open

Missing excpetion handling for infinite / missing predictions #136

MalteKurz opened this issue Oct 26, 2021 · 2 comments

Comments

@MalteKurz
Copy link
Member

There is no exception handling in-place in case some learner produces infinite or missing predictions. Basically, very silently the estimates are becoming NA's without a warning or exception.

See for example:

library(DoubleML)

g = function(x) {
  res = sin(x)^2
  return(res)
}

m = function(x, nu = 0, gamma = 1) {
  xx = sinh(gamma) / (cosh(gamma) - cos(x - nu))
  res = 0.5 / pi * xx
  return(res)
}

dgp1_irmiv = function(theta, N, k) {
  
  b = 1 / (1:k)
  sigma = clusterGeneration::genPositiveDefMat(k, "unifcorrmat")$Sigma
  
  X = mvtnorm::rmvnorm(N, sigma = sigma)
  G = g(as.vector(X %*% b))
  M = m(as.vector(X %*% b))
  
  pr_z = 1 / (1 + exp(-(1) * X[, 1] * b[5] + X[, 2] * b[2] + rnorm(N)))
  z = rbinom(N, 1, pr_z)
  
  U = rnorm(N)
  pr = 1 / (1 + exp(-(1) * (0.5 * z + X[, 1] * (-0.5) + X[, 2] * 0.25 - 0.5 * U + rnorm(N))))
  d = rbinom(N, 1, pr)
  err = rnorm(N)
  
  y = theta * d + G + 4 * U + err
  
  data = data.frame(y, d, z, X)
  
  return(data)
}

set.seed(1282)
df = dgp1_irmiv(0.5, 1000, 20)
Xnames = names(df)[names(df) %in% c("y", "d", "z") == FALSE]
dml_data = double_ml_data_from_data_frame(df,
                                          y_col = "y",
                                          d_cols = "d", x_cols = Xnames, z_col = "z")

ml_g = mlr3::lrn("regr.rpart", cp = 0.01, minsplit = 20)
ml_m = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)
ml_r = mlr3::lrn("classif.rpart", cp = 0.01, minsplit = 20)

set.seed(3141)
double_mliivm_obj = DoubleMLIIVM$new(
  data = dml_data,
  n_folds = 5,
  ml_g = ml_g,
  ml_m = ml_m,
  ml_r = ml_r,
  dml_procedure = "dml2",
  trimming_threshold = 0,
  score = "LATE")
double_mliivm_obj$fit()
print(double_mliivm_obj$coef)
print(double_mliivm_obj$se)

It is then getting even more confusing if one thereafter calls the method bootstrap(). This results in exception

double_mliivm_obj$bootstrap()
Error in double_mliivm_obj$bootstrap(): Apply fit() before bootstrap().

which is obviously not the root cause and also the remark to apply fit() will obviously not fix the issue.

I propose to implement a check for finite predictions similar to the check in the Python package: https://github.com/DoubleML/doubleml-for-py/blob/b3cbdb572fce435c18ec67ca323645900fc901b5/doubleml/_utils.py#L204-L208

@MalteKurz
Copy link
Member Author

The actual root cause in the example above is not a not finite prediction but a propensity score estimate of 1.

@MalteKurz
Copy link
Member Author

The actual root cause in the example above is not a not finite prediction but a propensity score estimate of 1.

Estimated probabilities / propensity scores may need special attention, i.e., a check that they are (strictly) in the interval (0,1). See also: DoubleML/doubleml-for-py#129

MalteKurz added a commit that referenced this issue Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant