This project fine-tunes the RoBERTa model for detecting unreliable news articles. The application classifies news articles as reliable or unreliable by leveraging a pre-trained transformer model and a labeled dataset. The backend is implemented using Flask and deployed on Render.com, with the fine-tuned model stored on Hugging Face Space.
- Model: Fine-tuned roberta-base for binary classification.
- Dataset: Labeled dataset with news article attributes: title, text, and reliability label.
- Deployment: Flask-based API hosted on Render; model stored on Hugging Face for easy access.
- Evaluation: Achieves over 99% accuracy and F1 score on the validation dataset.
-
Clone the repository:
git clone https://github.com/username/unreliable-news-detector.git cd unreliable-news-detector
-
Install dependencies:
pip install -r requirements.txt
-
Set up Hugging Face authentication:
export HF_TOKEN=your_huggingface_token
-
Configure Flask environment variables:
export FLASK_APP=app.py export FLASK_ENV=development
Run the Flask server:
flask run
Access the API at http://localhost:5000
or the deployed version on Render.
POST /predict
Input: JSON withtitle
andtext
.
Output: Predicted label (0
= Reliable,1
= Unreliable) and confidence score.
-
Data Preprocessing:
- Cleaned text by removing special characters and URLs.
- Combined
title
andtext
fields for holistic context. - Removed duplicates and balanced classes.
-
Data Splitting:
- 80/20 split for training and validation.
- Ensured no overlap between sets.
-
Model Training:
- Used Hugging Face's
Trainer
API. - Fine-tuned RoBERTa for 2 epochs with a learning rate of
2e-5
.
- Used Hugging Face's
-
Evaluation:
- Metrics: Accuracy and F1 Score.
- Plotted confusion matrix and performance graphs.
- Learning Rate:
2e-5
- Batch Size:
16
- Epochs:
2
- Weight Decay:
0.01
- Training Loss: 0.0556
- Validation Accuracy: 99.35%
- Validation F1 Score: 99.34%
Predicted Reliable | Predicted Unreliable | |
---|---|---|
Actual Reliable | 2055 | 11 |
Actual Unreliable | 15 | 1944 |
- Plotted training loss and evaluation accuracy over epochs.
- Highlighted the model's strong generalization capabilities.
- Flask REST API serving predictions via POST requests.
- Backend deployed on Render.com.
- Model weights stored on Hugging Face Space for easy accessibility.
- Hugging Face for providing pre-trained RoBERTa and the Trainer API.
- Render.com for hosting the Flask backend.