APS Failure Prediction for Scania Trucks

Project Overview

This project focuses on predicting component failures in the Air Pressure System (APS) of heavy Scania trucks. The APS generates pressurized air used in critical truck functions, including braking and gear shifting. The goal is to develop a model that can accurately predict failures related to the APS, minimizing the cost of maintenance while ensuring truck safety.

Dataset

The dataset is sourced from the UCI Machine Learning Repository and consists of a collection of anonymized operational data. It represents two classes:

Positive Class: Trucks with failures related to a specific component of the APS.
Negative Class: Trucks with failures unrelated to the APS. The challenge lies in the imbalance of the dataset and the high cost associated with misclassification. Incorrectly predicting a failure leads to unnecessary checks (low cost), while missing a real failure can cause truck breakdowns (high cost).

Dataset Breakdown:

Training Set: 60,000 samples (1,000 positive, 59,000 negative). Test Set: 16,000 samples. Features: 171 attributes per record, including numerical counters. Feature Description: Missing Values: Represented as na.

Objective:

The core objective is to develop a machine learning model that:

Accurately Predicts APS Failures: Correctly classify trucks with APS-related failures (positive class) while minimizing misclassifications.
Minimizes Operational Costs: The model must optimize for cost efficiency based on the following cost structure:
- Cost_1: The cost of an unnecessary inspection, set at 10 units.
- Cost_2: The cost of missing a faulty truck, set at 500 units.
- Total Cost: The overall goal is to minimize the total cost, where:

Total Cost = (𝐶𝑜𝑠𝑡1 × False Positives) + (𝐶𝑜𝑠𝑡2 × False Negatives)

Methodology

The project follows a structured approach to develop and evaluate a cost-sensitive model:

Data Preprocessing:
- Handling missing values (na).
- Feature scaling and normalization.
- Addressing class imbalance through resampling techniques (oversampling/undersampling).
Exploratory Data Analysis (EDA):
- Understanding feature distributions and correlations.
- Visualizing class imbalance and critical features.
Modeling:
- Investigating multiple classification algorithms:
  - Random Forest
  - XGBoost
  - Support Vector Machines (SVM)
  - Gradient Boosting Machines
  - Logistic Regression
  - LightGBM
  - Decision Trees
- Reducing dimensionality using PCA.
- Fine-tuning hyperparameters using Hyperopt.
Cost-Sensitive Learning:
- Adjusting decision thresholds to prioritize minimizing False Negatives, given their higher cost impact.
Model Evaluation:
- Performance metrics such as:
  - Recall, F1-Score, ROC-AUC.
  - Custom cost metric based on Total Cost calculation.
- Analyzing model performance based on both Recall and cost-efficiency.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
config		config
myvenv		myvenv
sensor		sensor
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
fast.py		fast.py
main.py		main.py
mongodb.ipynb		mongodb.ipynb
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APS Failure Prediction for Scania Trucks

Project Overview

Dataset

Dataset Breakdown:

Objective:

Methodology

About

Languages

SiddhantH1512/AirPressureSensors

Folders and files

Latest commit

History

Repository files navigation

APS Failure Prediction for Scania Trucks

Project Overview

Dataset

Dataset Breakdown:

Objective:

Methodology

About

Resources

Stars

Watchers

Forks

Languages