Skip to content

machine learning techniques to predict company defaults by optimizing the trade-off between recall (minimizing false negatives) and precision (avoiding false positives). Logistic Regression and Random Forest models were trained, with emphasis on recall to ensure accurate identification of high-risk companies.

Notifications You must be signed in to change notification settings

Radhikareddy-chintareddy/Financial_Risk_Analysis_Machine_Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Financial Risk Analysis Project

Overview

This project is aimed at predicting financial risk for companies, focusing on their ability to avoid defaults on debt obligations. It involves analyzing financial data from 2015 and predicting defaults based on net worth data for 2016. Key methods used include data preprocessing, feature engineering, machine learning, and visualization.


Problem Statement

Defaults in companies can lead to lower credit ratings, higher borrowing costs, and challenges in raising capital. The objective is to predict the likelihood of default using historical financial data to help stakeholders make informed decisions.


Datasets

Primary Dataset

  • Description: Contains 67 columns representing financial metrics like Net Worth, Total Debt, Revenue, and Profit.
  • Target Variable: Networth_Next_Year used to derive the default variable.

Stock Price Data

  • Description: Weekly stock prices for companies from 2014 to 2020.

Project Workflow

1. Data Cleaning and Preprocessing

  • Column Renaming: Standardized column names (e.g., replaced spaces and special characters with underscores).
  • Outlier Treatment: Capped outliers using the 5th and 95th percentiles.
  • Missing Value Imputation: Used median imputation for filling missing values.

2. Feature Engineering

  • Created the binary target variable default:
    • 1: Networth_Next_Year < 0 (Defaulted).
    • 0: Networth_Next_Year > 0 (Non-Defaulted).
  • Addressed multicollinearity using Variance Inflation Factor (VIF).
  • Selected features based on univariate and bivariate analysis.

3. Exploratory Data Analysis

Key Insights

  • Boxplots and Heatmaps:
    • Variables like Networth and Capital_Employed showed significant separation between default and non-default groups.
  • Correlation Matrix:
    • Highlighted multicollinearity among independent variables like Gross_Block, PBIDT, and Total_Debt.
Variable Correlation with Target
Networth 0.85
Capital_Employed 0.78
PBIDT 0.72

4. Model Building

4.1 Logistic Regression

  • Approach A: Removed highly correlated variables using VIF > 5.
  • Approach B: Used all variables, iteratively removing those with p-values > 0.05.

4.2 Random Forest Classifier

  • Built a base model with default parameters.
  • Tuned hyperparameters using GridSearchCV.
  • Applied SMOTE to address class imbalance.

4.3 Linear Discriminant Analysis (LDA)

  • Explored LDA for classification but noted weaker performance compared to Random Forest.

5. Model Evaluation

  • Metrics Used:
    • Recall: Prioritized to minimize false negatives.
    • Precision: Evaluated to avoid false positives.
    • Accuracy: Provided overall performance.

Best Model: Random Forest with SMOTE

Metric Train Data Test Data
Accuracy 98% 94%
Recall 93% 91%
Precision 87% 84%
F1-Score 90% 87%

6. Stock Price Analysis

  • Visualization:
    • Weekly stock price trends for companies like Infosys and SAIL.
    • Highlighted volatility using boxplots.

Returns and Risk Analysis

Stock Mean Return Standard Deviation (Risk)
Shree Cement 5.2% 2.4%
Infosys 4.8% 1.8%
Idea Vodafone -3.4% 5.8%

7. Key Results

Logistic Regression

  • Model B (all variables with p-values < 0.05) outperformed Model A.

Random Forest

  • GridSearchCV-tuned model with SMOTE showed the highest performance.

Stock Analysis

  • High-risk stocks like Idea Vodafone and Jet Airways showed negative returns and high volatility.
  • Shree Cement and Infosys emerged as high-return, low-risk stocks.

Recommendations

  • Investment Strategy:
    • Focus on high-return, low-volatility stocks (e.g., Shree Cement, Infosys).
    • Avoid high-risk stocks with low returns and high volatility.
  • Model Deployment:
    • Use the Random Forest model with SMOTE for default prediction.
    • Regularly update the model with new data.

About

machine learning techniques to predict company defaults by optimizing the trade-off between recall (minimizing false negatives) and precision (avoiding false positives). Logistic Regression and Random Forest models were trained, with emphasis on recall to ensure accurate identification of high-risk companies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published