UnravelSports · UnravelSports · Oct 18, 2024 · Aug 29, 2024 · Oct 10, 2024 · Oct 14, 2024
diff --git a/.github/workflows/pytest.yml b/.github/workflows/pytest.yml
@@ -4,7 +4,6 @@ on:
   push:
     branches:
       - main
-      - bug/crystal_conv_saving
   pull_request:
     branches:
       - main
@@ -20,18 +19,39 @@ jobs:
 
     steps:
     - uses: actions/checkout@v2
+
     - name: Set up Python ${{ matrix.python-version }}
       uses: actions/setup-python@v2
       with:
         python-version: ${{ matrix.python-version }}
-    - name: Install dependencies
+
+    # Linux/macOS - Install dependencies
+    - name: Install dependencies (Linux/macOS)
+      if: runner.os != 'Windows'   # Only run on Linux/macOS
+      env:
+        PYTHONIOENCODING: utf-8  # Ensure Python uses UTF-8 encoding
       run: |
         python -m pip install --upgrade pip
-        pip install -e .[test]
+        python -m pip install -e .[test]
+      shell: bash
+
+    # Windows - Install dependencies
+    - name: Install dependencies (Windows)
+      if: runner.os == 'Windows'   # Only run on Windows
+      env:
+        PYTHONIOENCODING: utf-8  # Ensure Python uses UTF-8 encoding
+        PYTHONUTF8: 1  # Force Python to use UTF-8 mode
+      run: |
+        chcp 65001  # Change code page to UTF-8
+        python -m pip install --upgrade pip setuptools
+        python -m pip install -e .[test]
+      shell: pwsh
+
     - name: Code formatting
       run: |
         pip install "black[jupyter]==24.4.2"
         black --check .
+
     - name: Test with pytest
       run: |
-        pytest --color=yes
+        pytest --color=yes
diff --git a/.gitignore b/.gitignore
@@ -180,8 +180,10 @@ build.py
 /pickle_files
 **/pickles
 **/pickle_files
-tests/files/models/my-test-gnn/*
+tests/files/models/*
 tests/files/test.pickle.gz
+tests/files/bdb/*
+tests/files/kloppy/*
 
 examples/models/*
 /models

diff --git a/README.md b/README.md
@@ -17,25 +17,35 @@ The **unravelsports** package aims to aid researchers, analysts and enthusiasts
 🌀 Features
 -----
 
-⚽ Convert **positional soccer data** into graphs to train **graph neural networks** by leveraging the powerful [**Kloppy**](https://github.com/PySport/kloppy/tree/master) data conversion standard and [**Spektral**](https://github.com/danielegrattarola/spektral) - a flexible framework for training graph neural networks. 
-
-⚽ **Randomize** and **split** data into train, test and validation sets along matches, sequences or possessions to avoid leakage and improve model quality.
-
-⚽ **Train**, **validate** and **test** your (custom) Graph model(s) and easily **predict** on new data.
-
-⚽ Leverage the power of **Kloppy** standardization and **unravelsports** to execute the these features for _Metrica_, _Sportec_, _Tracab (CyronHego)_, _SecondSpectrum_, _SkillCorner_ and _StatsPerform_ tracking data.
+### **Convert**
+
+⚽ **Soccer positional tracking data** into [Graphs](examples/graphs_faq.md) to train **graph neural networks** by leveraging the powerful [**Kloppy**](https://github.com/PySport/kloppy) data conversion standard for 
+  - _Metrica_
+  - _Sportec_
+  - _Tracab (CyronHego)_
+  - _SecondSpectrum_
+  - _SkillCorner_ 
+  - _StatsPerform_ 
+
+🏈 **BigDataBowl American football positional tracking data** into [Graphs](examples/graphs_faq.md) to train **graph neural networks** by leveraging [**Polars**](https://github.com/pola-rs/polars).
+
+### **Graph Neural Networks**
+These [Graphs](examples/graphs_faq.md) can be used with [**Spektral**](https://github.com/danielegrattarola/spektral) - a flexible framework for training graph neural networks. 
+`unravelsports` allows you to **randomize** and **split** data into train, test and validation sets along matches, sequences or possessions to avoid leakage and improve model quality. And finally, **train**, **validate** and **test** your (custom) Graph model(s) and easily **predict** on new data.
 
 ⌛ ***More to come soon...!***
 
 🌀 Quick Start
 -----
-📖 The [**Quick Start Jupyter Notebook**](examples/0_quick_start_guide.ipynb) explains how to convert any positional tracking data from **Kloppy** to **Spektral GNN** in a few easy steps while walking you through the most important features and documentation.
+📖 ⚽ The [**Quick Start Jupyter Notebook**](examples/0_quick_start_guide.ipynb) explains how to convert any positional tracking data from **Kloppy** to **Spektral GNN** in a few easy steps while walking you through the most important features and documentation.
+
+📖 ⚽ The [**Graph Converter Tutorial Jupyter Notebook**](examples/1_kloppy_gnn_train.ipynb) gives an in-depth walkthrough.
 
-📖 The [**Graph Converter Tutorial Jupyter Notebook**](examples/1_kloppy_gnn_train.ipynb) gives an in-depth walkthrough.
+📖 🏈 The [**BigDataBowl Converter Tutorial Jupyter Notebook**](examples/2_big_data_bowl_guide.ipynb) gives an guide on how to convert the BigDataBowl data into Graphs.
 
 🌀 Documentation
 -----
-For now, follow the [**Graph Converter Tutorial**](examples/1_kloppy_gnn_train.ipynb), more documentation will follow!
+For now, follow the [**Graph Converter Tutorial**](examples/1_kloppy_gnn_train.ipynb) and check the [**Graph FAQ**](examples/graphs_faq.md), more documentation will follow!
 
 Additional reading:
 
@@ -55,6 +65,7 @@ spektral==1.20.0
 tensorflow==2.14.0 
 keras==2.14.0
 kloppy==3.15.0
+polars==1.2.1
 ```
 These dependencies come pre-installed with the package. It is advised to create a [virtual environment](https://virtualenv.pypa.io/en/latest/).
 

diff --git a/examples/0_quick_start_guide.ipynb b/examples/0_quick_start_guide.ipynb
@@ -51,7 +51,7 @@
     "\n",
     "1. Load [Kloppy](https://github.com/PySport/kloppy) dataset. \n",
     "    See [in-depth Tutorial](1_kloppy_gnn_train.ipynb) on how do processes multiple match files, and to see an overview of all possible settings.\n",
-    "2. Convert to Graph format using `GraphConverter`\n",
+    "2. Convert to Graph format using `SoccerGraphConverter`\n",
     "3. Create dataset for easy processing with [Spektral](https://graphneural.network/) using `CustomSpektralDataset`"
    ]
   },
@@ -61,7 +61,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from unravel.soccer import GraphConverter\n",
+    "from unravel.soccer import SoccerGraphConverter\n",
     "from unravel.utils import CustomSpektralDataset\n",
     "\n",
     "from kloppy import skillcorner\n",
@@ -77,7 +77,9 @@
     "\n",
     "# Initialize the Graph Converter, with dataset and labels\n",
     "# Here we use the default settings\n",
-    "converter = GraphConverter(dataset=kloppy_dataset, labels=dummy_labels(kloppy_dataset))\n",
+    "converter = SoccerGraphConverter(\n",
+    "    dataset=kloppy_dataset, labels=dummy_labels(kloppy_dataset)\n",
+    ")\n",
     "\n",
     "# Compute the graphs and add them to the CustomSpektralDataset\n",
     "dataset = CustomSpektralDataset(graphs=converter.to_spektral_graphs())"

diff --git a/examples/1_kloppy_gnn_train.ipynb b/examples/1_kloppy_gnn_train.ipynb
@@ -27,7 +27,7 @@
    "source": [
     "In this in-depth walkthrough we'll discuss everything the `unravelsports` package has to offer for converting a [Kloppy](https://github.com/PySport/kloppy) dataset of soccer tracking data into graphs for training binary classification graph neural networks using the [Spektral](https://graphneural.network/) library.\n",
     "\n",
-    "This walkthrough will touch on a lot of the concepts from [A Graph Neural Network Deep-dive into Successful Counterattacks {A. Sahasrabudhe & J. Bekkers}](https://github.com/USSoccerFederation/ussf_ssac_23_soccer_gnn). It is strongly advised to first read the [research paper (pdf)](https://ussf-ssac-23-soccer-gnn.s3.us-east-2.amazonaws.com/public/Sahasrabudhe_Bekkers_SSAC23.pdf). Some concepts are also explained in the [Graphs FAQ](graphs_faq.ipynb).\n",
+    "This walkthrough will touch on a lot of the concepts from [A Graph Neural Network Deep-dive into Successful Counterattacks {A. Sahasrabudhe & J. Bekkers}](https://github.com/USSoccerFederation/ussf_ssac_23_soccer_gnn). It is strongly advised to first read the [research paper (pdf)](https://ussf-ssac-23-soccer-gnn.s3.us-east-2.amazonaws.com/public/Sahasrabudhe_Bekkers_SSAC23.pdf). Some concepts are also explained in the [Graphs FAQ](graphs_faq.md).\n",
     "\n",
     "Step by step we'll show how this package can be used to load soccer positional (tracking) data with `kloppy`, how to convert this data into \"graphs\", train a Graph Neural Network with `spektral`, evaluate it's performance, save and load the model and finally apply the model to unseen data to make predictions.\n",
     "\n",
@@ -57,7 +57,7 @@
     "    - [7.4 Evaluate Model](#74-evaluate-model)\n",
     "    - [7.5 Predict on New Data](#75-predict-on-new-data)\n",
     "\n",
-    "ℹ️ [**Graphs FAQ**](graphs_faq.ipynb)\n",
+    "ℹ️ [**Graphs FAQ**](graphs_faq.md)\n",
     "\n",
     "-----"
    ]
@@ -68,7 +68,7 @@
    "source": [
     "### 1. Imports\n",
     "\n",
-    "We import `GraphConverter` to help us convert from Kloppy positional tracking frames to graphs.\n",
+    "We import `SoccerGraphConverter` to help us convert from Kloppy positional tracking frames to graphs.\n",
     "\n",
     "With the power of **Kloppy** we can also load data from many providers by importing `metrica`, `sportec`, `tracab`, `secondspectrum`, or `statsperform` from `kloppy`."
    ]
@@ -79,7 +79,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from unravel.soccer import GraphConverter\n",
+    "from unravel.soccer import SoccerGraphConverter\n",
     "\n",
     "from kloppy import skillcorner"
    ]
@@ -97,7 +97,7 @@
    "source": [
     "### 2. Public SkillCorner Data\n",
     "\n",
-    "The `GraphConverter` class allows processing data from every tracking data provider supported by [PySports Kloppy](https://github.com/PySport/kloppy), namely:\n",
+    "The `SoccerGraphConverter` class allows processing data from every tracking data provider supported by [PySports Kloppy](https://github.com/PySport/kloppy), namely:\n",
     "- Sportec\n",
     "- Tracab\n",
     "- SecondSpectrum\n",
@@ -132,12 +132,12 @@
     "\n",
     "ℹ️ For more information on:\n",
     "- What a Graph is, check out [Graph FAQ Section A](graphs_faq.ipynb)\n",
-    "- What parameters we can pass to the `GraphConverter`, check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
+    "- What parameters we can pass to the `SoccerGraphConverter`, check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
     "- What features each Graph has, check out [Graph FAQ Section C](graphs_faq.ipynb)\n",
     "\n",
     "---\n",
     "\n",
-    "To get started with the `GraphConverter` we need to pass one _required_ parameter:\n",
+    "To get started with the `SoccerGraphConverter` we need to pass one _required_ parameter:\n",
     "- `dataset` (of type `TrackingDataset` (Kloppy)) \n",
     "\n",
     "And one parameter that's required when we're converting for training purposes (more on this later):\n",
@@ -148,7 +148,7 @@
     "⚠️ As mentioned before you will need to create your own labels! In this example we'll use `dummy_labels(dataset)` to generate a fake label for each frame.\n",
     "\n",
     "#### Graph Identifier(s):\n",
-    "When training a model on tracking data it's highly recommended to split data into test/train(/validation) sets by match or period such that all data end up in the same test, train or validation set. This should be done to avoid leaking information between test, train and validation sets. To make this simple, there are two _optional_ parameters we can pass to `GraphConverter`, namely:\n",
+    "When training a model on tracking data it's highly recommended to split data into test/train(/validation) sets by match or period such that all data end up in the same test, train or validation set. This should be done to avoid leaking information between test, train and validation sets. To make this simple, there are two _optional_ parameters we can pass to `SoccerGraphConverter`, namely:\n",
     "- `graph_id`. This is a single identifier (str or int) for a whole match, for example the unique match id.\n",
     "- `graph_ids`. This is a dictionary with the same keys as `labels`, but the values are now the unique identifiers. This option can be used if we want to split by sequence or possession_id. For example: {frame_id: 'matchId-sequenceId', frame_id: 'match_Id-sequenceId2'} etc. You will need to create your own ids. Note, if `labels` and `graph_ids` don't have the exact same keys it will throw an error.\n",
     "\n",
@@ -176,7 +176,7 @@
     "Important things to note:\n",
     "- We import `dummy_labels` to randomly generate binary labels. Training with these random labels will not create a good model.\n",
     "- We import `dummy_graph_ids` to generate fake graph labels.\n",
-    "- The `GraphConverter` handles all necessary steps (like setting the correct coordinate system, and left-right normalization).\n",
+    "- The `SoccerGraphConverter` handles all necessary steps (like setting the correct coordinate system, and left-right normalization).\n",
     "- We will end up with fewer than 2,000 eventhough we set `limit=500` frames because we set `include_empty_frames=False` and all frames without ball coordinates are automatically ommited.\n",
     "- When using other providers always set `include_empty_frames=False` or `only_alive=True`.\n",
     "- We store the data as individual compressed pickle files, one file for per match. The data that gets stored in the pickle is a list of dictionaries, one dictionary per frame. Each dictionary has keys for the adjacency matrix, node features, edge features, label and graph id."
@@ -223,7 +223,7 @@
     "        )\n",
     "\n",
     "        # Initialize the Graph Converter, with dataset, labels and settings\n",
-    "        converter = GraphConverter(\n",
+    "        converter = SoccerGraphConverter(\n",
     "            dataset=dataset,\n",
     "            # create fake labels\n",
     "            labels=dummy_labels(dataset),\n",
@@ -254,7 +254,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "ℹ️ For a full table of parameters we can pass to the `GraphConverter` check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
+    "ℹ️ For a full table of parameters we can pass to the `SoccerGraphConverter` check out [Graph FAQ Section B](graphs_faq.ipynb)\n",
     "\n",
     "-----"
    ]
@@ -303,7 +303,7 @@
     "Our `dataset` object has two custom methods to help split the data into train, test and validation sets.\n",
     "Either use `dataset.split_test_train()` if we don't need a validation set, or `dataset.split_test_train_validation()` if we do also require a validation set.\n",
     "\n",
-    "We can split our data 'by_graph_id' if we have provided Graph Ids in our `GraphConverter` using the 'graph_id' or 'graph_ids' parameter.\n",
+    "We can split our data 'by_graph_id' if we have provided Graph Ids in our `SoccerGraphConverter` using the 'graph_id' or 'graph_ids' parameter.\n",
     "\n",
     "The 'split_train', 'split_test' and 'split_validation' parameters can either be ratios, percentages or relative size compared to total. \n",
     "\n",
@@ -338,7 +338,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "🗒️ We can see that, because we are splitting by only 4 different graph_ids here (the 4 match_ids) the ratio's aren't perfectly 4 to 1 to 1. If you change the `graph_id=match_id` parameter in the `GraphConverter` to `graph_ids=dummy_graph_ids(dataset)` you'll see that it's easier to get close to the correct ratios, simply because we have a lot more graph_ids to split a cross. "
+    "🗒️ We can see that, because we are splitting by only 4 different graph_ids here (the 4 match_ids) the ratio's aren't perfectly 4 to 1 to 1. If you change the `graph_id=match_id` parameter in the `SoccerGraphConverter` to `graph_ids=dummy_graph_ids(dataset)` you'll see that it's easier to get close to the correct ratios, simply because we have a lot more graph_ids to split a cross. "
    ]
   },
   {
@@ -582,7 +582,7 @@
     "\n",
     "1. Load new, unseen data from the SkillCorner dataset.\n",
     "2. Convert this data, making sure we use the exact same settings as in step 1.\n",
-    "3. If we set `prediction=True` we do not have to supply labels to the `GraphConverter`."
+    "3. If we set `prediction=True` we do not have to supply labels to the `SoccerGraphConverter`."
    ]
   },
   {
@@ -597,7 +597,7 @@
     "    limit=500,\n",
     ")\n",
     "\n",
-    "preds_converter = GraphConverter(\n",
+    "preds_converter = SoccerGraphConverter(\n",
     "    dataset=kloppy_dataset,\n",
     "    prediction=True,\n",
     "    ball_carrier_treshold=25.0,\n",
@@ -772,7 +772,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "venv",
+   "display_name": ".venv311",
    "language": "python",
    "name": "python3"
   },