Math125A_LanguageIdentification
This project focuses on the task of Language Identification within the realm of Natural Language Processing (NLP). It showcases an implementation of two distinct machine learning models - the Perceptron and Naive Bayes - to accurately identify the language of given text samples. Language Identification is crucial in various NLP applications, such as content categorization and as a preprocessing step in complex tasks like translation and sentiment analysis.
The dataset used in this project is sourced from Kaggle: Language Identification Dataset. It comprises text samples in various languages, which are divided into training, validation, and testing sets.
Naive Bayes: A probabilistic classifier known for its efficiency in text classification tasks. Perceptron: A simple yet effective linear classifier.
Language_Identification.ipynb: The Jupyter notebook containing the entire project's code and documentation. dataset/: Directory containing the dataset used in the project. requirements.txt: A text file listing the dependencies required to run the notebook.