Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DALEX on custom stacked models #73

Open
leungi opened this issue Jun 22, 2019 · 2 comments
Open

DALEX on custom stacked models #73

leungi opened this issue Jun 22, 2019 · 2 comments
Assignees

Comments

@leungi
Copy link

leungi commented Jun 22, 2019

Pseudo-reprex below to illustrate workflow.

There are 2 stages of stacking, and below is abbreviated to final stage.

# input data for prediction; this data are themselves a result of stacked model
df <- tibble::tribble(
  ~x1, ~x2, ~true,           ~pred1,           ~pred2,           ~pred3,           ~pred4,           ~pred5,
  "0016",   1, 11255, 9782.06546666667, 8226.73783726366, 8423.53411898339, 7663.85714285714, 7778.32234611454,
  "0016",   2, 10155, 9917.16225000001,  7390.2726470072, 7548.50621212894, 6011.57142857143,  7020.0197927677,
  "0016",   3,  9905, 8365.66048333333, 4748.35733132711, 4897.40398331136, 5625.14285714286, 5197.59820269678,
  "0026",   1,  9569, 10542.7790333333, 12448.8281473898, 12982.2853847065, 9529.42857142857, 9913.60100542533,
  "0026",   2, 15004,      12332.88455, 13118.3179554928, 13490.4519001908, 9449.14285714286, 9782.48187764126,
  "0027",   1,  4623, 6228.92556666668, 7901.02224985066,  8072.3059097473, 7663.85714285714,  7564.7019858157,
  "0027",   2,  3666, 3902.33416666666, 5351.58779239503, 5501.55032427708, 5757.85714285714, 5791.90612060224,
  "0027",   3,  2046, 3730.91108333333, 5405.90164588071, 5431.22100425988,             5700, 5574.85787520228,
  "0345",   1,  7848, 7911.66811666667, 7332.14726332333, 7535.03388134704, 8428.85714285714, 7504.20919309283,
  "0345",   2,  5594,        6249.8431, 5302.09068924222, 5602.24650648537,             6253, 5936.17306199591,
  "0348",   1,  6118,        5888.9112,  6782.1549012783, 6983.85792156352, 7145.28571428571, 6996.64665890851,
  "0348",   2,  4115,        4655.3621, 4061.92478416692,  4339.3944039624, 5379.71428571429, 5201.36079952954,
  "0348",   3,  3792, 4703.56786666666, 4862.77758785772, 4886.36623749198, 5413.85714285714,  5316.2047603152,
  "1000",   1,  9982,        8894.2428, 8950.05680053561, 8724.27457157357, 7643.14285714286, 8427.52273508174,
  "1000",   2,  4218, 5103.73553333333, 6755.30317981863, 6492.15505744351,             7836, 6900.52725335413,
  "1022",   1,  9021, 8966.84941666667, 8921.14926298024, 8514.45660876879, 8590.57142857143, 8566.07119574923,
  "1022",   2, 11692, 10205.8180333333, 8895.88440879051, 8417.59814231434, 8185.85714285714, 8225.60579235643,
  "1022",   3,  9420, 9664.82173333334, 9422.99681882565, 8835.71873031759, 7853.57142857143, 8126.76078652109,
  "1022",   4,  6850, 7419.07043333333, 8995.48869657391, 8194.63910112673, 7604.14285714286, 7815.14405713875,
  "1022",   5,  6850, 7419.07043333333,  8817.8438463534, 7883.22080414475, 6846.14285714286, 7515.84608489043
)

# model list for stacking
md <- list(rf,
           pca,
           svm,
           enet)

model_pred_stack <- function(df, md) {
  
  # iterate over list of models in md, and average prediction
  temp <- 0
  for (i in 1:length(md)) {
    temp <- temp + predict(md[[i]], df)
  }
  temp <- temp / length(md)
  return(temp)
}

model_pred <- model_pred_stack(df = df, md)

# with DALEX, have to loop over list of models one by one, which doesn't reflect intention of stacking; otherwise, modelDown will complain
explain_stacked <- explain(
  md[[1]],
  data = df,
  y = df$true,
  label = "stacked"
)

modelDown(explain_stacked,
          device = "svg",
          output_folder = "output_data/modelDown_stacked")

# passing list of models; modelDown fails to generate diagnostics, aside from data description
explain_stacked <- explain(
  md,
  data = df,
  y = df$true,
  label = "stacked"
)

modelDown(explain_stacked,
          device = "svg",
          output_folder = "output_data/modelDown_stacked")
@kromash
Copy link
Collaborator

kromash commented Jun 28, 2019

Thank you for provided example. I will look into it.

@kromash kromash self-assigned this Jun 28, 2019
@leungi
Copy link
Author

leungi commented Jun 28, 2019

Appreciate the prompt response; your team's work (starting from DALEX) is a game changer 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants