Multi-Task Learning for Driver Identification and Transport Mode Classification

This project is part of my Master's thesis for the MSc in Data Science with Artificial Intelligence program at the University of Exeter. It implements a multi-task deep learning model for simultaneous driver identification and transport mode classification using smartphone sensor data from the SHL preview dataset. The project leverages multi-task learning (MTL) to improve performance across both tasks and optimize for real-world applications such as usage-based insurance and transport analytics.

Project Structure

Root Directory

The root directory contains the main notebooks used for training the models and testing different aspects of the pipeline:

driver_identification_singletask.ipynb: Notebook for training the driver identification model using a ResNet50-GRU architecture.
transport_classification_singletask.ipynb: Notebook for training the transport mode classification model using a BiLSTM architecture.
multitask_MTL_model.ipynb: Notebook for training and evaluating the multi-task learning (MTL) model that combines both tasks.
preprocess.ipynb: Notebook to preprocess the raw SHL dataset and generate the required feature maps and labels.
metrics_result.ipynb: Notebook to evaluate and compare the performance of the single-task models and the MTL model.
hyperparam_test/: Folder containing scripts and files used for hyperparameter tuning.

`src/` Folder

The src/ directory contains the core scripts for the project, including the data handling, model definitions, and utility functions:

config.py: Configuration settings for training the models, including hyperparameters and file paths. [It is not updated though. Find config in training files]
dataset.py: Functions for loading and processing the dataset.
engine.py: Functions to train and evaluate the models.
hyperparam.py: Functions to perform hyperparameter tuning.
metrics.py: Functions to calculate metrics such as accuracy, precision, recall, F1-score, and ROC-AUC for the models.
model_driverID.py: Driver identification model (ResNet50-GRU architecture).
model_multitask.py: Multi-task learning model architecture that combines transport mode classification and driver identification.
model_simpleCNN.py: Simple CNN model for testing.
model_transportMode.py: Transport mode classification model (BiLSTM architecture).
plot.py: Functions to generate learning curves, precision-recall curves, and other visualizations.
preprocess.py: Script to preprocess the raw SHL dataset into feature maps and labels for training.
utils.py: Utility functions for various tasks such as data augmentation, saving and loading models, etc.

`prep_files/` Folder

This folder contains notebooks used for testing and debugging the functions on sample datasets before applying them to the full dataset.

`seed_traintest/` Folder

This folder contains notebooks for training the models using different random seeds for reproducibility. These include:

driver_identification_seeds.ipynb: Training the driver identification model with different seeds.
transport_classification_seeds.ipynb: Training the transport mode classification model with different seeds.
multitask_MTL_seeds.ipynb: Training the MTL model with different seeds.

`model_checkpoint/` Folder

This folder stores model checkpoints during training.

`data/` Folder

This folder is intended to hold the raw SHL dataset. The dataset is not included in this repository due to its size. It can be downloaded from the official SHL dataset website. The preprocessing script in preprocess.ipynb will generate the required LSTM features, labels, and feature maps for the models.

Getting Started

Requirements

Python 3.7+
PyTorch
NumPy
Matplotlib
Scikit-learn
Torchvision
Pandas

Install the required packages using:

pip install -r requirements.txt

Data Preparation

Download the SHL dataset and place it in the data/ folder.
Run the preprocess.ipynb notebook to generate the preprocessed data, including LSTM features, feature maps, and labels.

Training the Models

Single-task models:
- To train the driver identification model, run the driver_identification_singletask.ipynb.
- To train the transport mode classification model, run the transport_classification_singletask.ipynb.
Multi-task model:
- To train the multi-task learning model, run the multitask_MTL_model.ipynb.
Seed experiments:
- To evaluate the robustness of the models with different seeds, run the respective notebooks in the seed_traintest/ folder.

Evaluation and Metrics

Evaluation metrics include accuracy, precision, recall, and F1-score, calculated across multiple classes using the scripts in metrics.py. Performance is visualized using learning curves, precision-recall curves, and F1-score-recall curves, which are plotted using the plot.py script.

Class-wise performance is evaluated to understand how well the models generalize across different transport modes and drivers.

Ensemble Comparisons

The performance of the multi-task learning model is compared against single-task models using an ensemble method. The ensemble model combines predictions from the transport mode classification and driver identification models to form composite labels, which are compared to the multi-task model's predictions.

Conclusion

This project is part of a Master's thesis that successfully demonstrates the use of multi-task learning for driver identification and transport mode classification. The multi-task model shows competitive performance compared to single-task models, offering a scalable solution for usage-based insurance and other real-world applications.

Future Work

Further exploration could involve testing additional sensor data, improving the augmentation techniques for the driver identification task, and extending the model to more complex multi-task scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Task Learning for Driver Identification and Transport Mode Classification

Project Structure

Root Directory

`src/` Folder

`prep_files/` Folder

`seed_traintest/` Folder

`model_checkpoint/` Folder

`data/` Folder

Getting Started

Requirements

Data Preparation

Training the Models

Evaluation and Metrics

Ensemble Comparisons

Conclusion

Future Work

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
hyperparam_test		hyperparam_test
img		img
logs		logs
model_checkpoint		model_checkpoint
prep_files		prep_files
report_findings		report_findings
seed_traintest		seed_traintest
src		src
ReadMe.md		ReadMe.md
computational graph.png		computational graph.png
concat_data.py		concat_data.py
custom.css		custom.css
driver_identification_singletask.ipynb		driver_identification_singletask.ipynb
metrics_result.ipynb		metrics_result.ipynb
multitask_MTL_model.ipynb		multitask_MTL_model.ipynb
preprocess.ipynb		preprocess.ipynb
requirements.txt		requirements.txt
transport_classification_singletask.ipynb		transport_classification_singletask.ipynb

EsosaOrumwese/Msc_Research_Project_Multi-Task-Learning_CNN-GRU-biLSTM

Folders and files

Latest commit

History

Repository files navigation

Multi-Task Learning for Driver Identification and Transport Mode Classification

Project Structure

Root Directory

src/ Folder

prep_files/ Folder

seed_traintest/ Folder

model_checkpoint/ Folder

data/ Folder

Getting Started

Requirements

Data Preparation

Training the Models

Evaluation and Metrics

Ensemble Comparisons

Conclusion

Future Work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`src/` Folder

`prep_files/` Folder

`seed_traintest/` Folder

`model_checkpoint/` Folder

`data/` Folder

Packages