SkimLit

An NLP model to classify abstract sentences into the role they play (e.g. objective, methods, results, etc..) to enable researchers to skim through the literature and dive deeper when necessary.

Try Demo; WEB APP

More specificially, I'am going to replicate the deep learning model behind the 2017 paper PubMed 200k RCT: a Dataset for Sequenctial Sentence Classification in Medical Abstracts.

Dataset Used

PubMed 200k RCT dataset

The PubMed 200k RCT dataset is described in Franck Dernoncourt, Ji Young Lee. PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts. International Joint Conference on Natural Language Processing (IJCNLP). 2017.

Some miscellaneous information:

PubMed 20k is a subset of PubMed 200k. I.e., any abstract present in PubMed 20k is also present in PubMed 200k.
PubMed_200k_RCT is the same as PubMed_200k_RCT_numbers_replaced_with_at_sign, except that in the latter all numbers had been replaced by @. (same for PubMed_20k_RCT vs. PubMed_20k_RCT_numbers_replaced_with_at_sign).
Count Plot

Models Tried

All the note books are availabel here

NaiveBiase Model -> 72% Accuracy
Conv1D Model -> 78% Accuracy
Model using pretrained token embedding ( Universal sentence embedding ) -> 75% Accuracy
Conv1D Model using character level embedding -> 73% Accuracy
Model with both token and charcter level embedding -> 76% Accuracy
Model with token, character and position level embedding ( https://arxiv.org/pdf/1612.05251.pdf ) -> 81% Accuracy

Model described in this paper with bert embedding -> 88% Accuracy

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

Tensorflow
tensorflow_text
tensorflow_hub
sklearn
Matplotlib
numpy
pandas
spaCy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SkimLit

Dataset Used

Models Tried

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

Contact Me

Files

README.md

Latest commit

History

README.md

File metadata and controls

SkimLit

Dataset Used

Models Tried

Final Results

Results of all Models

Best Performong Model

Final Outputs

Packages Used

Contact Me