Skip to content

DogEncode provides some methods to encode categoical features fast and easy. From simple methodology like dummy variables to Owen Zhang's kaggle famous leave one out encoding, this R package will cover them all.

License

Notifications You must be signed in to change notification settings

germayneng/DogEncode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DogEncode

DogEncode provides some methods to encode categoical features fast and easy. From simple methodology like dummy variables to Owen Zhang's kaggle famous leave one out encoding, this R package will cover them all.

Installation

# install.packages("devtools")
devtools::install_github("germayneng/DogEncode")

Usage

Currently, DogEncode covers the following encoding methods. I will be glad to add more as I learn more :)

  1. one hot encoding (dummy)
# feed in dataframe wtih cat variables. class of variables to be character 
# mode = auto to automate dropping off one dummy 
# mode = manual to generate all dummy 
one_hot_encoding(df, mode = "manual") # you can bind this entire dataframe to your features 
  1. label encoding aka colhot encoding
# feed in dataframe with cat variable. class of variables to be character 
# if you have both train and test, you can feed both inside to ensure that the labels are standardized between them 
# otherwise, let argument test = NULL 

# example if there is test 
end <- label_encoding(train = temp, test = temp_test) 
Names <- end[[1]] # encoded for train 
Names_test <- end[[2]] # encoded for test 

# if there is no test 
end <- label_encoding(train = temp, test = NULL) 
  1. Count encoding aka counthot encoding
# feed in single column of variable, i.e train$cat 
# example 

example <- data.frame(
   id = c(rep("a", 5), rep("b", 3), rep("c", 2), "d"),
   resp = c(1, 0, 0, NA, 0, 1, 0, 0, 1, 0, 1))
   
 
 count_encoding(example$id, mode = "normal")
 
 # if want log scale 
 count_encoding(example$id, mode = "log")
 
  1. Label count encoding
example <- data.frame(
   id = c(rep("a", 5), rep("b", 3), rep("c", 2), "d"),
   resp = c(1, 0, 0, NA, 0, 1, 0, 0, 1, 0, 1))
   
   
label_count_encoding(example$id)
  1. Leave one out encoding
example <- data.frame(
   id = c(rep("a", 5), rep("b", 3), rep("c", 2), "d"),
   resp = c(1, 0, 0, NA, 0, 1, 0, 0, 1, 0, 1))
   
 LOO_encode(example$id, example$resp) 

To do

  • Add more documentation to explain every function and credit to respective people.
  • Hash encoding

Credit

Owen Zhang, HJ van Veen, Dex Grooves and Simone Aiosa's dog logo
super helpful post to help debug ^.^

License

About

DogEncode provides some methods to encode categoical features fast and easy. From simple methodology like dummy variables to Owen Zhang's kaggle famous leave one out encoding, this R package will cover them all.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages