Photo by Dave Goudreau on Unsplash
This repository opens a series of tutorials introducing techniques that will help you making better data-driven marketing decisions. We start by introducing Market Basket Analysis (MBA); a powerful tool used for product promotion and recommendation. We will discuss several techniques and show you their implementation such that you can employ them yourself.
Here we make use of Python tools. The main one is mlxtend
package which is very useful for performing important tasks for Market Basket Analysis such as:
- Pre-process data
- Generate item sets and rules
- Filter according to metrics
After completing this tutorial, you'll know: What Market Basket Analysis is
- How to prepare your data to apply MBA
- Some metrics used in MBA
- How to perform MBA using the Apriori algorithm
- How to apply some simple visualizations used in MBA
The notebook Introduction to Market Basket Analysis provides an introduction to MBA through use cases.
In this tutorial we use two datasets.
-
An small fictional bakery dataset consisting of 298 transactions containing 7 unique items.
-
A transactional dataset which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered online retail. This one is available at the UCI Machine Learning repository. Here we used a subset of the dataset containing only transactions of customers in The Netherlands.
Both datasets can be found here.
If you are curious to know we obtained the dataset containing only products purchased in The Netherlands was obtained check this notebook.
mlxtend Python package which contains useful tools for performing important tasks inherent to MBA.
In particular the following functions were used:
-
Transaction encoder (from mlxtend.preprocessing import TransactionEncoder)
-
Apriori algorithm (from mlxtend.frequent_patterns import apriori)
-
Association rule (from mlxtend.frequent_patterns import association_rules)
- Install requirements using
pip install -r time_series_requirements.txt
.- Make sure you use Python 3.
- You may want to use a virtual environment for this.