Skip to content

SevilayMuni/Flask-App-Roberta-Detect-News

Repository files navigation

Detecting Unreliable News with Fine-Tuned RoBERTa

This project fine-tunes the RoBERTa model for detecting unreliable news articles. The application classifies news articles as reliable or unreliable by leveraging a pre-trained transformer model and a labeled dataset. The backend is implemented using Flask and deployed on Render.com, with the fine-tuned model stored on Hugging Face Space.

Features

  • Model: Fine-tuned roberta-base for binary classification.
  • Dataset: Labeled dataset with news article attributes: title, text, and reliability label.
  • Deployment: Flask-based API hosted on Render; model stored on Hugging Face for easy access.
  • Evaluation: Achieves over 99% accuracy and F1 score on the validation dataset.

Table of Contents

  1. Installation
  2. Usage
  3. Project Workflow
  4. Model Training
  5. Results
  6. Deployment
  7. Acknowledgments

Installation

  1. Clone the repository:

    git clone https://github.com/username/unreliable-news-detector.git
    cd unreliable-news-detector
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up Hugging Face authentication:

    export HF_TOKEN=your_huggingface_token
  4. Configure Flask environment variables:

    export FLASK_APP=app.py
    export FLASK_ENV=development

Usage

Run the Flask server:

flask run

Access the API at http://localhost:5000 or the deployed version on Render.

API Endpoints

  • POST /predict
    Input: JSON with title and text.
    Output: Predicted label (0 = Reliable, 1 = Unreliable) and confidence score.

Project Workflow

  1. Data Preprocessing:

    • Cleaned text by removing special characters and URLs.
    • Combined title and text fields for holistic context.
    • Removed duplicates and balanced classes.
  2. Data Splitting:

    • 80/20 split for training and validation.
    • Ensured no overlap between sets.
  3. Model Training:

    • Used Hugging Face's Trainer API.
    • Fine-tuned RoBERTa for 2 epochs with a learning rate of 2e-5.
  4. Evaluation:

    • Metrics: Accuracy and F1 Score.
    • Plotted confusion matrix and performance graphs.

Model Training

Hyperparameters:

  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 2
  • Weight Decay: 0.01

Performance:

  • Training Loss: 0.0556
  • Validation Accuracy: 99.35%
  • Validation F1 Score: 99.34%

Results

Confusion Matrix:

Predicted Reliable Predicted Unreliable
Actual Reliable 2055 11
Actual Unreliable 15 1944

Visualization:

  • Plotted training loss and evaluation accuracy over epochs.
  • Highlighted the model's strong generalization capabilities.

Deployment

Backend:

  • Flask REST API serving predictions via POST requests.

Hosting:

  • Backend deployed on Render.com.
  • Model weights stored on Hugging Face Space for easy accessibility.

Acknowledgments

  • Hugging Face for providing pre-trained RoBERTa and the Trainer API.
  • Render.com for hosting the Flask backend.