AppClassifier is a multilabel
NLP
project. Almost 360+ labels are used in this project which is deployed and rendered.
HuggingFace API | Rendered website |
---|---|
I collected data from : https://sourceforge.net/
Around 33634 pieces of data have been collected.
Because this project is based on MultiLabel
, each entity must have more than one label. I chose the most common labels because the least common labels can distract a model from detecting accurate labels.
Around 360+ labels are chosen and then Converted the string categories into numerical form.
For the rest of the work I use PyTorch
WorkFrame. Also Blurr
Api for training models. For model choosing, I used two types of models, those are collected from the HuggingFace
model library.
distilroberta-base
andbertabaporu-large
I used 2 models for comparison. But both models performed the same; they gave 99%
accuracy. Both model did great work.
- distilroberta-base : As all the processes done in Pytorch so I had to use
dataloaders
for transform dataset for model. In that case I choose batch size32
. This model is very faster than other models. So I used this model for that project - bertabaporu-large : For this model I had to choose the Batch size 2. Othewise
CUDA
Terminate the training process for crossing the limit ofCUDA
. This model is the most Slower.
All of the models can be found in here.
I used HuggingFace
to deploy this project. It is very easy to use and free.
You can use this project as deployed : https://huggingface.co/spaces/Rimi98/AppClassifier
I used Flask
for rendering this project as an open website. I created a very basic GUI to build this website.
Also I use the Render for integration.
Click this link
Almost 50000+ data points can be found on this website where I have collected 33634 data points. My future work will be to collect more data and develop this project. Anyone can join me. Feel free to pull a request.