math125a

Math125A_LanguageIdentification

Language Identification with Perceptron and Naive Bayes Models

Project Overview

This project focuses on the task of Language Identification within the realm of Natural Language Processing (NLP). It showcases an implementation of two distinct machine learning models - the Perceptron and Naive Bayes - to accurately identify the language of given text samples. Language Identification is crucial in various NLP applications, such as content categorization and as a preprocessing step in complex tasks like translation and sentiment analysis.

Data Source

The dataset used in this project is sourced from Kaggle: Language Identification Dataset. It comprises text samples in various languages, which are divided into training, validation, and testing sets.

Models Used

Naive Bayes: A probabilistic classifier known for its efficiency in text classification tasks. Perceptron: A simple yet effective linear classifier.

Repository Contents

Language_Identification.ipynb: The Jupyter notebook containing the entire project's code and documentation. dataset/: Directory containing the dataset used in the project. requirements.txt: A text file listing the dependencies required to run the notebook.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset		dataset
Math125_Project_AndyHe.ipynb		Math125_Project_AndyHe.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

math125a

Language Identification with Perceptron and Naive Bayes Models

Project Overview

Data Source

Models Used

Repository Contents

About

Releases

Packages

Languages

AndyHe021112/math125a

Folders and files

Latest commit

History

Repository files navigation

math125a

Language Identification with Perceptron and Naive Bayes Models

Project Overview

Data Source

Models Used

Repository Contents

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages