Skip to content

rahulguptagzb09/YouTube-Scrapper-And-Category-Classifier

Repository files navigation

YouTube-Scrapper-And-Category-Classifier

YouTube Scrapper And Category Classifier Scraping Data-
The scraping is done by YouTube Data API V3. The API provides search list function which takes search query as parameter along with other parameters like region, type. This API return result in JSON format.
I wrote a function which uses this API and return a dictionary with column names as keys and content data as values. Through this I was able to get maximum, accurate and relevant results.
The scraping script generates a CSV file from the results.
Text Classification-
For text classification I used one model from each category mentioned in assignment.

  1. From first category, I used SVM model because it was more accurate and scalable. SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Support vector machines algorithm categorizes unlabelled data, and is one of the most widely used clustering algorithms in industrial applications.
    SVM Accuracy Score: 32.91015625
    Precision: 0.329102
    Recall: 0.329102
    F1: 0.329102
  2. From second category, I used shallow NN model because it was based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised. The NN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output. The NN gives better results on datasets that are not easily separable and are to complicated for naïve algorithms to classify.
    Loss: 0.166
    Accuracy: 0.941

    F1 Score: 0.789
    Precision: 0.950
    Recall: 0.680
  3. From third category, I used shallow RNN model because in which data can flow in any direction, are used for applications such as language modelling. Long short-term memory is particularly effective for this use. RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. RNNs are better at understanding the sequence of text than any other because they does not lose the order of the text.
    Loss: 0.464
    Accuracy: 0.833
    F1 Score: 0.000
    Precision: 0.000
    Recall: 0.000