Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation of target encoding with only 1 positive example #275

Closed
joaquinvanschoren opened this issue Oct 21, 2020 · 4 comments
Closed

Comments

@joaquinvanschoren
Copy link

Expected Behavior

I would expect the target encoding to follow the definition in the paper:
https://dl.acm.org/doi/10.1145/507533.507538

Actual Behavior

There is an extra line of code that, after the computation of the encoding, sets the encoding value equal to the prior when there is only one positive example (one example of a certain category with positive class)

This is the line:

smoothing[stats['count'] == 1] = prior

Where does this come from? I can't seem to find this anywhere in the paper.

Thanks!

@PaulWestenthanner
Copy link
Collaborator

Hi @joaquinvanschoren
This is indeed not in the paper. I don't know exactly why this line is there and can only speculate, but I think it makes sense though:
If this line was not included the value with default configuration (min_samples_leaf=1) of a label that only occurs once will be the average of prior and the target value. This feels like over-weighing the target value which leads to over-fitting models trained with the encoded feature.
With hindsight I think it might be better to not have this line but to have a higher default value for min_samples_leaf.

Another explanation that just crossed my mind is that the documentation for the min_samples_leaf parameter says minimum samples to take category average into account. which is also not quite correct. The paper says the parameter determines half of the minimal sample size for which we completely "trust" the estimate. Maybe this was confused by the original author. @charliec443 can you shade some light on this?

@glevv
Copy link
Contributor

glevv commented Feb 16, 2022

This kinda makes sense but it is not consistent with the paper.
With n = k, smoothing value should be equal to 0.5, which makes the result for this group equal to 0.5 * group_mean + 0.5 * global_mean. There is no need to treat case with 1 value in the group differently, I guess.

@joaquinvanschoren
Copy link
Author

joaquinvanschoren commented Feb 17, 2022

Agree that not including the special case and updating the default value makes more sense, and it would be easier to know what to expect. I am using the code to explain target encoding to students, and the unexpected result makes that a lot harder :)

@PaulWestenthanner
Copy link
Collaborator

I'm closing this as the discussion is over on #327 and there is a future warning that defaults will change introduced by b072ab0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants