Computation of target encoding with only 1 positive example #275

joaquinvanschoren · 2020-10-21T22:18:08Z

Expected Behavior

I would expect the target encoding to follow the definition in the paper:
https://dl.acm.org/doi/10.1145/507533.507538

Actual Behavior

There is an extra line of code that, after the computation of the encoding, sets the encoding value equal to the prior when there is only one positive example (one example of a certain category with positive class)

This is the line:

category_encoders/category_encoders/target_encoder.py

Line 176 in a810a4b

smoothing[stats['count'] == 1] = prior

Where does this come from? I can't seem to find this anywhere in the paper.

Thanks!

PaulWestenthanner · 2021-10-13T20:37:19Z

Hi @joaquinvanschoren
This is indeed not in the paper. I don't know exactly why this line is there and can only speculate, but I think it makes sense though:
If this line was not included the value with default configuration (min_samples_leaf=1) of a label that only occurs once will be the average of prior and the target value. This feels like over-weighing the target value which leads to over-fitting models trained with the encoded feature.
With hindsight I think it might be better to not have this line but to have a higher default value for min_samples_leaf.

Another explanation that just crossed my mind is that the documentation for the min_samples_leaf parameter says minimum samples to take category average into account. which is also not quite correct. The paper says the parameter determines half of the minimal sample size for which we completely "trust" the estimate. Maybe this was confused by the original author. @charliec443 can you shade some light on this?

glevv · 2022-02-16T17:30:26Z

This kinda makes sense but it is not consistent with the paper.
With n = k, smoothing value should be equal to 0.5, which makes the result for this group equal to 0.5 * group_mean + 0.5 * global_mean. There is no need to treat case with 1 value in the group differently, I guess.

joaquinvanschoren · 2022-02-17T09:49:07Z

Agree that not including the special case and updating the default value makes more sense, and it would be easier to know what to expect. I am using the code to explain target encoding to students, and the unexpected result makes that a lot harder :)

PaulWestenthanner · 2022-06-02T11:28:41Z

I'm closing this as the discussion is over on #327 and there is a future warning that defaults will change introduced by b072ab0

PaulWestenthanner mentioned this issue Nov 28, 2021

Unique levels, smoothing, and QuantileEncoder #327

Open

PaulWestenthanner closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computation of target encoding with only 1 positive example #275

Computation of target encoding with only 1 positive example #275

joaquinvanschoren commented Oct 21, 2020

PaulWestenthanner commented Oct 13, 2021

glevv commented Feb 16, 2022 •

edited

Loading

joaquinvanschoren commented Feb 17, 2022 •

edited

Loading

PaulWestenthanner commented Jun 2, 2022

Computation of target encoding with only 1 positive example #275

Computation of target encoding with only 1 positive example #275

Comments

joaquinvanschoren commented Oct 21, 2020

Expected Behavior

Actual Behavior

PaulWestenthanner commented Oct 13, 2021

glevv commented Feb 16, 2022 • edited Loading

joaquinvanschoren commented Feb 17, 2022 • edited Loading

PaulWestenthanner commented Jun 2, 2022

glevv commented Feb 16, 2022 •

edited

Loading

joaquinvanschoren commented Feb 17, 2022 •

edited

Loading