Skip to content

AklimaRimi/AppClassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visitors

AppClassifier

Motive

AppClassifier is a multilabel NLP project. Almost 360+ labels are used in this project which is deployed and rendered.

HuggingFace API Rendered website

Data Collection

I collected data from : https://sourceforge.net/
Around 33634 pieces of data have been collected.

Data Preprocessing

Because this project is based on MultiLabel, each entity must have more than one label. I chose the most common labels because the least common labels can distract a model from detecting accurate labels.

Around 360+ labels are chosen and then Converted the string categories into numerical form.

Training

For the rest of the work I use PyTorch WorkFrame. Also Blurr Api for training models. For model choosing, I used two types of models, those are collected from the HuggingFace model library.

  1. distilroberta-base and
  2. bertabaporu-large

Training Results

I used 2 models for comparison. But both models performed the same; they gave 99% accuracy. Both model did great work.

  1. distilroberta-base : As all the processes done in Pytorch so I had to use dataloaders for transform dataset for model. In that case I choose batch size 32. This model is very faster than other models. So I used this model for that project
  2. bertabaporu-large : For this model I had to choose the Batch size 2. Othewise CUDA Terminate the training process for crossing the limit of CUDA. This model is the most Slower.

Models

All of the models can be found in here.

Deployment

I used HuggingFace to deploy this project. It is very easy to use and free. You can use this project as deployed : https://huggingface.co/spaces/Rimi98/AppClassifier

Integration/ Render

I used Flask for rendering this project as an open website. I created a very basic GUI to build this website. Also I use the Render for integration.

Click this link

Future Work

Almost 50000+ data points can be found on this website where I have collected 33634 data points. My future work will be to collect more data and develop this project. Anyone can join me. Feel free to pull a request.