Skip to content

Latest commit

 

History

History
74 lines (54 loc) · 3.49 KB

README.md

File metadata and controls

74 lines (54 loc) · 3.49 KB

SkimLit

An NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc..) to enable researchers to skim through the literature and dive deeper when necessary.

Try Demo; WEB APP

Dataset Used

PubMed 200k RCT dataset

Some miscellaneous information:

  • PubMed 20k is a subset of PubMed 200k. I.e., any abstract present in PubMed 20k is also present in PubMed 200k.

  • PubMed_200k_RCT is the same as PubMed_200k_RCT_numbers_replaced_with_at_sign, except that in the latter all numbers had been replaced by @. (same for PubMed_20k_RCT vs. PubMed_20k_RCT_numbers_replaced_with_at_sign).

  • Count Plot

Models Tried

All the note books are availabel here

  • NaiveBiase Model -> 72% Accuracy
  • Conv1D Model -> 78% Accuracy
  • Model using pretrained token embedding ( Universal sentence embedding ) -> 75% Accuracy
  • Conv1D Model using character level embedding -> 73% Accuracy
  • Model with both token and charcter level embedding -> 76% Accuracy
  • Model with token, character and position level embedding ( https://arxiv.org/pdf/1612.05251.pdf ) -> 81% Accuracy
  • Model described in this paper with bert embedding -> 88% Accuracy

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

  • Tensorflow
  • tensorflow_text
  • tensorflow_hub
  • sklearn
  • Matplotlib
  • numpy
  • pandas
  • spaCy

Contact Me

Github  LinkedIn  Instagram