Explored Greek Parliament Proceedings and tried to classify each speech to a corresponding parliamentary political party.
This is one of my favourite Machine Learning projects I have worked on.
The analysis begins with a classic data exploration and cleansing. After, a careful examination using matplotlib charts to help with the visualization of specific aspects and patterns in the data I begin the preprocessing stage.
Preprocessing played an important role in the classification of the data. The preprocessing was made possible using spaCy. Stopwords and punctuations were removed from the speeches. Lemmatization was applied to all altered speeches so as to simplify the classification process and provide us with more precise results.
To gauge the efficacy of the algorithm, report also the results of a baseline classifier, using, for instance, scikit-learn's DummyClassifier