Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigDataBowl Data Support #4

Merged
merged 18 commits into from
Oct 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 24 additions & 4 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ on:
push:
branches:
- main
- bug/crystal_conv_saving
pull_request:
branches:
- main
Expand All @@ -20,18 +19,39 @@ jobs:

steps:
- uses: actions/checkout@v2

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies

# Linux/macOS - Install dependencies
- name: Install dependencies (Linux/macOS)
if: runner.os != 'Windows' # Only run on Linux/macOS
env:
PYTHONIOENCODING: utf-8 # Ensure Python uses UTF-8 encoding
run: |
python -m pip install --upgrade pip
pip install -e .[test]
python -m pip install -e .[test]
shell: bash

# Windows - Install dependencies
- name: Install dependencies (Windows)
if: runner.os == 'Windows' # Only run on Windows
env:
PYTHONIOENCODING: utf-8 # Ensure Python uses UTF-8 encoding
PYTHONUTF8: 1 # Force Python to use UTF-8 mode
run: |
chcp 65001 # Change code page to UTF-8
python -m pip install --upgrade pip setuptools
python -m pip install -e .[test]
shell: pwsh

- name: Code formatting
run: |
pip install "black[jupyter]==24.4.2"
black --check .

- name: Test with pytest
run: |
pytest --color=yes
pytest --color=yes
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -180,8 +180,10 @@ build.py
/pickle_files
**/pickles
**/pickle_files
tests/files/models/my-test-gnn/*
tests/files/models/*
tests/files/test.pickle.gz
tests/files/bdb/*
tests/files/kloppy/*

examples/models/*
/models
Expand Down
31 changes: 21 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,35 @@ The **unravelsports** package aims to aid researchers, analysts and enthusiasts
🌀 Features
-----

⚽ Convert **positional soccer data** into graphs to train **graph neural networks** by leveraging the powerful [**Kloppy**](https://github.com/PySport/kloppy/tree/master) data conversion standard and [**Spektral**](https://github.com/danielegrattarola/spektral) - a flexible framework for training graph neural networks.

⚽ **Randomize** and **split** data into train, test and validation sets along matches, sequences or possessions to avoid leakage and improve model quality.

⚽ **Train**, **validate** and **test** your (custom) Graph model(s) and easily **predict** on new data.

⚽ Leverage the power of **Kloppy** standardization and **unravelsports** to execute the these features for _Metrica_, _Sportec_, _Tracab (CyronHego)_, _SecondSpectrum_, _SkillCorner_ and _StatsPerform_ tracking data.
### **Convert**

⚽ **Soccer positional tracking data** into [Graphs](examples/graphs_faq.md) to train **graph neural networks** by leveraging the powerful [**Kloppy**](https://github.com/PySport/kloppy) data conversion standard for
- _Metrica_
- _Sportec_
- _Tracab (CyronHego)_
- _SecondSpectrum_
- _SkillCorner_
- _StatsPerform_

🏈 **BigDataBowl American football positional tracking data** into [Graphs](examples/graphs_faq.md) to train **graph neural networks** by leveraging [**Polars**](https://github.com/pola-rs/polars).

### **Graph Neural Networks**
These [Graphs](examples/graphs_faq.md) can be used with [**Spektral**](https://github.com/danielegrattarola/spektral) - a flexible framework for training graph neural networks.
`unravelsports` allows you to **randomize** and **split** data into train, test and validation sets along matches, sequences or possessions to avoid leakage and improve model quality. And finally, **train**, **validate** and **test** your (custom) Graph model(s) and easily **predict** on new data.

⌛ ***More to come soon...!***

🌀 Quick Start
-----
📖 The [**Quick Start Jupyter Notebook**](examples/0_quick_start_guide.ipynb) explains how to convert any positional tracking data from **Kloppy** to **Spektral GNN** in a few easy steps while walking you through the most important features and documentation.
📖 ⚽ The [**Quick Start Jupyter Notebook**](examples/0_quick_start_guide.ipynb) explains how to convert any positional tracking data from **Kloppy** to **Spektral GNN** in a few easy steps while walking you through the most important features and documentation.

📖 ⚽ The [**Graph Converter Tutorial Jupyter Notebook**](examples/1_kloppy_gnn_train.ipynb) gives an in-depth walkthrough.

📖 The [**Graph Converter Tutorial Jupyter Notebook**](examples/1_kloppy_gnn_train.ipynb) gives an in-depth walkthrough.
📖 🏈 The [**BigDataBowl Converter Tutorial Jupyter Notebook**](examples/2_big_data_bowl_guide.ipynb) gives an guide on how to convert the BigDataBowl data into Graphs.

🌀 Documentation
-----
For now, follow the [**Graph Converter Tutorial**](examples/1_kloppy_gnn_train.ipynb), more documentation will follow!
For now, follow the [**Graph Converter Tutorial**](examples/1_kloppy_gnn_train.ipynb) and check the [**Graph FAQ**](examples/graphs_faq.md), more documentation will follow!

Additional reading:

Expand All @@ -55,6 +65,7 @@ spektral==1.20.0
tensorflow==2.14.0
keras==2.14.0
kloppy==3.15.0
polars==1.2.1
```
These dependencies come pre-installed with the package. It is advised to create a [virtual environment](https://virtualenv.pypa.io/en/latest/).

Expand Down
8 changes: 5 additions & 3 deletions examples/0_quick_start_guide.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"\n",
"1. Load [Kloppy](https://github.com/PySport/kloppy) dataset. \n",
" See [in-depth Tutorial](1_kloppy_gnn_train.ipynb) on how do processes multiple match files, and to see an overview of all possible settings.\n",
"2. Convert to Graph format using `GraphConverter`\n",
"2. Convert to Graph format using `SoccerGraphConverter`\n",
"3. Create dataset for easy processing with [Spektral](https://graphneural.network/) using `CustomSpektralDataset`"
]
},
Expand All @@ -61,7 +61,7 @@
"metadata": {},
"outputs": [],
"source": [
"from unravel.soccer import GraphConverter\n",
"from unravel.soccer import SoccerGraphConverter\n",
"from unravel.utils import CustomSpektralDataset\n",
"\n",
"from kloppy import skillcorner\n",
Expand All @@ -77,7 +77,9 @@
"\n",
"# Initialize the Graph Converter, with dataset and labels\n",
"# Here we use the default settings\n",
"converter = GraphConverter(dataset=kloppy_dataset, labels=dummy_labels(kloppy_dataset))\n",
"converter = SoccerGraphConverter(\n",
" dataset=kloppy_dataset, labels=dummy_labels(kloppy_dataset)\n",
")\n",
"\n",
"# Compute the graphs and add them to the CustomSpektralDataset\n",
"dataset = CustomSpektralDataset(graphs=converter.to_spektral_graphs())"
Expand Down
32 changes: 16 additions & 16 deletions examples/1_kloppy_gnn_train.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"source": [
"In this in-depth walkthrough we'll discuss everything the `unravelsports` package has to offer for converting a [Kloppy](https://github.com/PySport/kloppy) dataset of soccer tracking data into graphs for training binary classification graph neural networks using the [Spektral](https://graphneural.network/) library.\n",
"\n",
"This walkthrough will touch on a lot of the concepts from [A Graph Neural Network Deep-dive into Successful Counterattacks {A. Sahasrabudhe & J. Bekkers}](https://github.com/USSoccerFederation/ussf_ssac_23_soccer_gnn). It is strongly advised to first read the [research paper (pdf)](https://ussf-ssac-23-soccer-gnn.s3.us-east-2.amazonaws.com/public/Sahasrabudhe_Bekkers_SSAC23.pdf). Some concepts are also explained in the [Graphs FAQ](graphs_faq.ipynb).\n",
"This walkthrough will touch on a lot of the concepts from [A Graph Neural Network Deep-dive into Successful Counterattacks {A. Sahasrabudhe & J. Bekkers}](https://github.com/USSoccerFederation/ussf_ssac_23_soccer_gnn). It is strongly advised to first read the [research paper (pdf)](https://ussf-ssac-23-soccer-gnn.s3.us-east-2.amazonaws.com/public/Sahasrabudhe_Bekkers_SSAC23.pdf). Some concepts are also explained in the [Graphs FAQ](graphs_faq.md).\n",
"\n",
"Step by step we'll show how this package can be used to load soccer positional (tracking) data with `kloppy`, how to convert this data into \"graphs\", train a Graph Neural Network with `spektral`, evaluate it's performance, save and load the model and finally apply the model to unseen data to make predictions.\n",
"\n",
Expand Down Expand Up @@ -57,7 +57,7 @@
" - [7.4 Evaluate Model](#74-evaluate-model)\n",
" - [7.5 Predict on New Data](#75-predict-on-new-data)\n",
"\n",
"ℹ️ [**Graphs FAQ**](graphs_faq.ipynb)\n",
"ℹ️ [**Graphs FAQ**](graphs_faq.md)\n",
"\n",
"-----"
]
Expand All @@ -68,7 +68,7 @@
"source": [
"### 1. Imports\n",
"\n",
"We import `GraphConverter` to help us convert from Kloppy positional tracking frames to graphs.\n",
"We import `SoccerGraphConverter` to help us convert from Kloppy positional tracking frames to graphs.\n",
"\n",
"With the power of **Kloppy** we can also load data from many providers by importing `metrica`, `sportec`, `tracab`, `secondspectrum`, or `statsperform` from `kloppy`."
]
Expand All @@ -79,7 +79,7 @@
"metadata": {},
"outputs": [],
"source": [
"from unravel.soccer import GraphConverter\n",
"from unravel.soccer import SoccerGraphConverter\n",
"\n",
"from kloppy import skillcorner"
]
Expand All @@ -97,7 +97,7 @@
"source": [
"### 2. Public SkillCorner Data\n",
"\n",
"The `GraphConverter` class allows processing data from every tracking data provider supported by [PySports Kloppy](https://github.com/PySport/kloppy), namely:\n",
"The `SoccerGraphConverter` class allows processing data from every tracking data provider supported by [PySports Kloppy](https://github.com/PySport/kloppy), namely:\n",
"- Sportec\n",
"- Tracab\n",
"- SecondSpectrum\n",
Expand Down Expand Up @@ -132,12 +132,12 @@
"\n",
"ℹ️ For more information on:\n",
"- What a Graph is, check out [Graph FAQ Section A](graphs_faq.ipynb)\n",
"- What parameters we can pass to the `GraphConverter`, check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
"- What parameters we can pass to the `SoccerGraphConverter`, check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
"- What features each Graph has, check out [Graph FAQ Section C](graphs_faq.ipynb)\n",
"\n",
"---\n",
"\n",
"To get started with the `GraphConverter` we need to pass one _required_ parameter:\n",
"To get started with the `SoccerGraphConverter` we need to pass one _required_ parameter:\n",
"- `dataset` (of type `TrackingDataset` (Kloppy)) \n",
"\n",
"And one parameter that's required when we're converting for training purposes (more on this later):\n",
Expand All @@ -148,7 +148,7 @@
"⚠️ As mentioned before you will need to create your own labels! In this example we'll use `dummy_labels(dataset)` to generate a fake label for each frame.\n",
"\n",
"#### Graph Identifier(s):\n",
"When training a model on tracking data it's highly recommended to split data into test/train(/validation) sets by match or period such that all data end up in the same test, train or validation set. This should be done to avoid leaking information between test, train and validation sets. To make this simple, there are two _optional_ parameters we can pass to `GraphConverter`, namely:\n",
"When training a model on tracking data it's highly recommended to split data into test/train(/validation) sets by match or period such that all data end up in the same test, train or validation set. This should be done to avoid leaking information between test, train and validation sets. To make this simple, there are two _optional_ parameters we can pass to `SoccerGraphConverter`, namely:\n",
"- `graph_id`. This is a single identifier (str or int) for a whole match, for example the unique match id.\n",
"- `graph_ids`. This is a dictionary with the same keys as `labels`, but the values are now the unique identifiers. This option can be used if we want to split by sequence or possession_id. For example: {frame_id: 'matchId-sequenceId', frame_id: 'match_Id-sequenceId2'} etc. You will need to create your own ids. Note, if `labels` and `graph_ids` don't have the exact same keys it will throw an error.\n",
"\n",
Expand Down Expand Up @@ -176,7 +176,7 @@
"Important things to note:\n",
"- We import `dummy_labels` to randomly generate binary labels. Training with these random labels will not create a good model.\n",
"- We import `dummy_graph_ids` to generate fake graph labels.\n",
"- The `GraphConverter` handles all necessary steps (like setting the correct coordinate system, and left-right normalization).\n",
"- The `SoccerGraphConverter` handles all necessary steps (like setting the correct coordinate system, and left-right normalization).\n",
"- We will end up with fewer than 2,000 eventhough we set `limit=500` frames because we set `include_empty_frames=False` and all frames without ball coordinates are automatically ommited.\n",
"- When using other providers always set `include_empty_frames=False` or `only_alive=True`.\n",
"- We store the data as individual compressed pickle files, one file for per match. The data that gets stored in the pickle is a list of dictionaries, one dictionary per frame. Each dictionary has keys for the adjacency matrix, node features, edge features, label and graph id."
Expand Down Expand Up @@ -223,7 +223,7 @@
" )\n",
"\n",
" # Initialize the Graph Converter, with dataset, labels and settings\n",
" converter = GraphConverter(\n",
" converter = SoccerGraphConverter(\n",
" dataset=dataset,\n",
" # create fake labels\n",
" labels=dummy_labels(dataset),\n",
Expand Down Expand Up @@ -254,7 +254,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"ℹ️ For a full table of parameters we can pass to the `GraphConverter` check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
"ℹ️ For a full table of parameters we can pass to the `SoccerGraphConverter` check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
"\n",
"-----"
]
Expand Down Expand Up @@ -303,7 +303,7 @@
"Our `dataset` object has two custom methods to help split the data into train, test and validation sets.\n",
"Either use `dataset.split_test_train()` if we don't need a validation set, or `dataset.split_test_train_validation()` if we do also require a validation set.\n",
"\n",
"We can split our data 'by_graph_id' if we have provided Graph Ids in our `GraphConverter` using the 'graph_id' or 'graph_ids' parameter.\n",
"We can split our data 'by_graph_id' if we have provided Graph Ids in our `SoccerGraphConverter` using the 'graph_id' or 'graph_ids' parameter.\n",
"\n",
"The 'split_train', 'split_test' and 'split_validation' parameters can either be ratios, percentages or relative size compared to total. \n",
"\n",
Expand Down Expand Up @@ -338,7 +338,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"🗒️ We can see that, because we are splitting by only 4 different graph_ids here (the 4 match_ids) the ratio's aren't perfectly 4 to 1 to 1. If you change the `graph_id=match_id` parameter in the `GraphConverter` to `graph_ids=dummy_graph_ids(dataset)` you'll see that it's easier to get close to the correct ratios, simply because we have a lot more graph_ids to split a cross. "
"🗒️ We can see that, because we are splitting by only 4 different graph_ids here (the 4 match_ids) the ratio's aren't perfectly 4 to 1 to 1. If you change the `graph_id=match_id` parameter in the `SoccerGraphConverter` to `graph_ids=dummy_graph_ids(dataset)` you'll see that it's easier to get close to the correct ratios, simply because we have a lot more graph_ids to split a cross. "
]
},
{
Expand Down Expand Up @@ -582,7 +582,7 @@
"\n",
"1. Load new, unseen data from the SkillCorner dataset.\n",
"2. Convert this data, making sure we use the exact same settings as in step 1.\n",
"3. If we set `prediction=True` we do not have to supply labels to the `GraphConverter`."
"3. If we set `prediction=True` we do not have to supply labels to the `SoccerGraphConverter`."
]
},
{
Expand All @@ -597,7 +597,7 @@
" limit=500,\n",
")\n",
"\n",
"preds_converter = GraphConverter(\n",
"preds_converter = SoccerGraphConverter(\n",
" dataset=kloppy_dataset,\n",
" prediction=True,\n",
" ball_carrier_treshold=25.0,\n",
Expand Down Expand Up @@ -772,7 +772,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"display_name": ".venv311",
"language": "python",
"name": "python3"
},
Expand Down
Loading
Loading