Urdu to English Machine Translation (Fine-tuning transformer)

This project demonstrates a machine translation system that translates Urdu text into English using the Hugging Face transformers library.

Setup

To get started, install the necessary dependencies:

pip install datasets
pip install transformers
pip install sacrebleu
pip install evaluate
pip install accelerate -U

Dataset

The dataset used for this project consists of Urdu to English sentence pairs. Ensure your dataset is structured with each pair on a new line, separated by a tab.

Preprocessing

Load the dataset:
- Read the dataset from a file.
- Split the dataset into training, validation, and testing sets.
Convert to Hugging Face Dataset format:
- Convert the data to dictionaries.
- Create DatasetDict with training, validation, and testing sets.
Tokenization:
- Tokenize the sentences using the Helsinki-NLP/opus-mt-ur-en tokenizer.

Model Training

Define Model:
- Load the pre-trained model and tokenizer from Hugging Face.
Freeze Specific Layers:
- Freeze the initial layers of the encoder and decoder to focus training on the remaining layers.
Training Arguments:
- Set training arguments such as learning rate, batch size, evaluation steps, etc.
Train:
- Use the Seq2SeqTrainer to train the model.

Evaluation

Metric:
- Use BLEU score for evaluation with the evaluate library.
Evaluate:
- Evaluate the model on the test dataset.

Usage

Load the Model:
- Load the fine-tuned model and tokenizer.
Translate Text:
- Encode the Urdu text and generate the English translation.

Results

After training, evaluate the model to check its performance on the test dataset. The evaluation results will include BLEU scores and other relevant metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dataset.txt		Dataset.txt
README.md		README.md
urdu-to-english-translation-transformer-ipynb.ipynb		urdu-to-english-translation-transformer-ipynb.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Urdu to English Machine Translation (Fine-tuning transformer)

Table of Contents

Setup

Dataset

Preprocessing

Model Training

Evaluation

Usage

Results

About

Releases

Packages

Languages

umar8637/Urdu-to-Eng-translation-fine-tuning-transformers

Folders and files

Latest commit

History

Repository files navigation

Urdu to English Machine Translation (Fine-tuning transformer)

Table of Contents

Setup

Dataset

Preprocessing

Model Training

Evaluation

Usage

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages