YouTube-Scrapper-And-Category-Classifier

YouTube Scrapper And Category Classifier Scraping Data-
The scraping is done by YouTube Data API V3. The API provides search list function which takes search query as parameter along with other parameters like region, type. This API return result in JSON format.
I wrote a function which uses this API and return a dictionary with column names as keys and content data as values. Through this I was able to get maximum, accurate and relevant results.
The scraping script generates a CSV file from the results.
Text Classification-
For text classification I used one model from each category mentioned in assignment.

From first category, I used SVM model because it was more accurate and scalable. SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Support vector machines algorithm categorizes unlabelled data, and is one of the most widely used clustering algorithms in industrial applications.
SVM Accuracy Score: 32.91015625
Precision: 0.329102
Recall: 0.329102
F1: 0.329102
From second category, I used shallow NN model because it was based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised. The NN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output. The NN gives better results on datasets that are not easily separable and are to complicated for naïve algorithms to classify.
Loss: 0.166
Accuracy: 0.941

F1 Score: 0.789
Precision: 0.950
Recall: 0.680
From third category, I used shallow RNN model because in which data can flow in any direction, are used for applications such as language modelling. Long short-term memory is particularly effective for this use. RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. RNNs are better at understanding the sequence of text than any other because they does not lose the order of the text.
Loss: 0.464
Accuracy: 0.833
F1 Score: 0.000
Precision: 0.000
Recall: 0.000

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Art and Music_2.csv		Art and Music_2.csv
Final.csv		Final.csv
Food_1.csv		Food_1.csv
History_2.csv		History_2.csv
LICENSE		LICENSE
Manufacturing_3.csv		Manufacturing_3.csv
README.md		README.md
Science and Technology_1.csv		Science and Technology_1.csv
Travel Blogs.csv		Travel Blogs.csv
model_ann.py		model_ann.py
model_nb.py		model_nb.py
model_rnn.py		model_rnn.py
model_svm.py		model_svm.py
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube-Scrapper-And-Category-Classifier

About

Releases

Packages

Languages

License

rahulguptagzb09/YouTube-Scrapper-And-Category-Classifier

Folders and files

Latest commit

History

Repository files navigation

YouTube-Scrapper-And-Category-Classifier

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages