Scope of this project is to classify question pairs as duplicate or non-duplicate by using NLP techniques for text preprocessing and feature extraction and then applying classification algorithms on the extracted features.
Then the algorithm which gives the best performance metrics will be chosen.
Models Used-
- Logistic Regression
- Linear SVM
- Decision Tree
- Random Forest
- XgBoost
Inspired from the Applied AI course.
Dataset used - Kaggle dataset