This project aims to predict rainfall using historical and current weather data. The model integrates three datasets:
- Daily Rainfall Data from January 2024.
- District-Wise Rainfall Normals for various districts.
- Historical Rainfall Data from 1901 to 2015.
The goal is to develop a predictive model using Linear Regression to forecast average rainfall based on various features.
This project is primarily a proof of concept and test model. The predictions from this model may not be practically accurate due to:
- Limited and possibly inaccurate data.
- Simplistic approach using linear regression without more advanced techniques.
- Preliminary preprocessing and feature engineering.
- Less cleaning up and pre-processing of data.
The results obtained are:
- Mean Absolute Error (MAE): 0.0704
- Root Mean Squared Error (RMSE): 0.2222
- R^2 Score: 0.0147
These results suggest that the model may not perform well in real-world scenarios.
- Python: Programming language used for data processing and modeling.
- Pandas: Library for data manipulation and analysis.
- Scikit-Learn: Library for machine learning, used for creating and evaluating the Linear Regression model.
- Jupyter Notebook: Development environment used for running the code and visualizing results.
-
Data Loading and Inspection:
- Loaded datasets from CSV files.
- Inspected the datasets to understand their structure and contents.
-
Data Transformation:
- Converted datasets to long format using
pd.melt()
to facilitate merging. - Processed the daily rainfall data to extract relevant features.
- Converted datasets to long format using
-
Data Merging:
- Merged daily rainfall data with monthly norms and historical rainfall data based on geographical and temporal features.
-
Data Preprocessing:
- Handled missing values by filling them with zeros.
- Defined features (
Day
,Monthly_Rainfall
,Historical_Rainfall
) and target variable (Avg_rainfall
).
-
Model Training and Evaluation:
- Split the data into training and testing sets.
- Trained a Linear Regression model on the training data.
- Evaluated the model's performance using MAE, RMSE, and R^2 Score.
- Advanced Modeling: Explore other machine learning models such as Decision Trees, Random Forests, or Gradient Boosting.
- Feature Engineering: Improve feature selection and engineering to enhance model performance.
- Data Quality: Use more accurate and comprehensive datasets for better predictions.
- Hyperparameter Tuning: Optimize model parameters for better results.
If you'd like to contribute to this project:
- Fork the repository.
- Clone your fork to your local machine.
- Create a new branch for your changes.
- Commit your changes and push them to your fork.
- Create a pull request describing your changes and improvements.
To use this project, follow these steps:
-
Clone the Repository:
git clone https://github.com/yourusername/rain-prediction-model.git
-
Navigate to the Project Directory:
cd rain-prediction-model
-
Set Up a Virtual Environment:
python -m venv env source env/bin/activate # On Windows, use `env\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Jupyter Notebook:
jupyter notebook
For the datasets used in this project, visit Data.gov.in.