Columbia Engineering competiton: Predicting click-through rates

Ensembled 12 CatBoost, XGBoost, LightGBM models using Stacking, with Elastic-Net Logistic Regression as meta-model

I competed in a team of 3 in a Columbia Engineering competition open to Masters students, and placed 4^th out of 200 participants.

Applied numerous categorical feature encoding methods on dataset with purely categorical features.

Extensive feature engineering to capture temporal nature of consumer clicks.

About

This competition was adapted from Avazu's highly popular Kaggle competition in 2015, linked here.

Avazu is a programmatic advertising platform that uses machine learning to decide which mobile advertisements get pushed to which consumers. Its aim is to maximize advertising effectiveness by ensuring the correct target group receieves the advertisements they are most interested in.

This competition is to predict consumer click-through rates on mobile advertisements, ie. whether a consumer clicks on an advertisement.

In online advertising, click-through rate is a very important metric for evaluating ad performance and is used in sponsored search and real-time bidding.

Data

This competition uses 4 million rows with ~30 features that contain information about the consumer's mobile device, the mobile advertisement, the website on which the advertisement was encountered, etc. Each row depicts a specific user on a specific mobile device, encountering a specific mobile advertisement, and the target of whether the user clicked on it.

Project Methodology

Obtain training and test sets via time split: Due to time nature of data
Feature engineering to identify unique consumers from device data
Feature engineering to extract time nature of data
Feature cleaning of rare values
Encode categorical features: Using Hash Encoding, Ordered Target Encoding, Ordinal Encoding
Train & tune model hyperparameters using Bayesian Optimization: CatBoost, XGBoost, LightGBM
Ensemble models: Using Stacking, with Elastic Net Logistic Regression as meta-model
Re-run on full data

Still curious?

Check out this project on my website here :)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
codes		codes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Columbia Engineering competiton: Predicting click-through rates

Ensembled 12 CatBoost, XGBoost, LightGBM models using Stacking, with Elastic-Net Logistic Regression as meta-model

About

Data

Project Methodology

Still curious?

About

Releases

Packages

Languages

sheilateozy/clickthrough-rate-predictor

Folders and files

Latest commit

History

Repository files navigation

Columbia Engineering competiton: Predicting click-through rates

Ensembled 12 CatBoost, XGBoost, LightGBM models using Stacking, with Elastic-Net Logistic Regression as meta-model

About

Data

Project Methodology

Still curious?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages