DogEncode provides some methods to encode categoical features fast and easy. From simple methodology like dummy variables to Owen Zhang's kaggle famous leave one out encoding, this R package will cover them all.
# install.packages("devtools")
devtools::install_github("germayneng/DogEncode")
Currently, DogEncode covers the following encoding methods. I will be glad to add more as I learn more :)
- one hot encoding (dummy)
# feed in dataframe wtih cat variables. class of variables to be character
# mode = auto to automate dropping off one dummy
# mode = manual to generate all dummy
one_hot_encoding(df, mode = "manual") # you can bind this entire dataframe to your features
- label encoding aka colhot encoding
# feed in dataframe with cat variable. class of variables to be character
# if you have both train and test, you can feed both inside to ensure that the labels are standardized between them
# otherwise, let argument test = NULL
# example if there is test
end <- label_encoding(train = temp, test = temp_test)
Names <- end[[1]] # encoded for train
Names_test <- end[[2]] # encoded for test
# if there is no test
end <- label_encoding(train = temp, test = NULL)
- Count encoding aka counthot encoding
# feed in single column of variable, i.e train$cat
# example
example <- data.frame(
id = c(rep("a", 5), rep("b", 3), rep("c", 2), "d"),
resp = c(1, 0, 0, NA, 0, 1, 0, 0, 1, 0, 1))
count_encoding(example$id, mode = "normal")
# if want log scale
count_encoding(example$id, mode = "log")
- Label count encoding
example <- data.frame(
id = c(rep("a", 5), rep("b", 3), rep("c", 2), "d"),
resp = c(1, 0, 0, NA, 0, 1, 0, 0, 1, 0, 1))
label_count_encoding(example$id)
- Leave one out encoding
example <- data.frame(
id = c(rep("a", 5), rep("b", 3), rep("c", 2), "d"),
resp = c(1, 0, 0, NA, 0, 1, 0, 0, 1, 0, 1))
LOO_encode(example$id, example$resp)
- Add more documentation to explain every function and credit to respective people.
- Hash encoding
Owen Zhang, HJ van Veen, Dex Grooves and Simone Aiosa's dog logo
super helpful post to help debug ^.^