This is a repository introducing hands-on football (soccer) data analyses to those who want to start working with football data and perform analyses on the same.
This project introduces the following concepts:
- How to access open event data from statsbomb api using
statsbombpy
{Notebook}, - How to draw and visualize a soccer pitch using mplsoccer {Notebook},
- How to visualize a pass network map for a particular team in a particular game {Notebook},
- How to use NetworkX module to analyse the pass network (eg. finding out degree distribution of passes, clustering coefficient, centrality, etc.) {Notebook},
- How to implement computational geometric concepts like Convex Hulls, Voronoi diagrams and Delaunay triangulations to understand and visualize football tracking data (using scipy.spatial and mplsoccer) {Notebook},
- How to analyse Expected Goals (xG) using open data from statsbomb {Notebook},
- How to use Radar Charts for comparing and evaluating players' per 90s stats using soccerplots package, {Notebook}
- How to use Linear Regression on football data, with the help of scikit learn module, to predict correlation betweeen Goals scored and Shots on goals {Notebook},
- How to make use of Elastic Net to find the relationship between number of shots taken vs the number of goals scored {Notebook},
- How to use Logistic Regression to predict whether a pass is a successful pass or not (given some features of the pass) {Notebook},
- How to use a Decision Tree Classifier to build a model for predicting a shot outcome from a particular team {Notebook},
- How to use Random Forest to predict whether a pass is a successful pass or not {Notebook},
- How to use Naive Bayes Classifier to predict a pass outcome {Notebook},
- How to use K-means clustering to cluster shot outcomes for Barcelona in La Liga {Notebook}
Resources that helped me start with football data analysis:
- Friends of Tracking youtube channel usually hosted and maintained by Dr. David Sumpter,
- Book Soccermatics by Dr. Sumpter,
- Youtube channel maintained by McKay Johns,
- FC Python blog,
- Graph Theory and Complex Network: An Introduction by Dr. Maarten Van Steen,
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron.
- La Liga 2020-21 shot stats - Sheet1.csv exported from FBREF.
- 2020-21 La Liga player stats (per 90s) - Sheet1.csv exported from FBREF.