IL-NER

This annotated corpora has been developed under the Bhashini project funded by Ministry of Electronics and Information Technology (MeitY), Government of India. We thank MeitY for funding this work.

This dataset is licensed under Creative Commons Attribution 4.0 (CC-BY-4.0) license. The details of the dataset are given below. This dataset was developed by three partnering institutes, IIIT Hyderabad, CDAC Noida and IIIT Bhubaneshwar.

Language	Train	Test	Dev
Hindi	11076	1389	1389
Urdu	8720	1096	1094
Odia	12109	1519	1517
Telugu	2993	384	384

To use this dataset, cite the paper as

  @misc{bahad2024finetuning,
        title={Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages}, 
        author={Sankalp Bahad and Pruthwik Mishra and Karunesh Arora and Rakesh Chandra Balabantaray and Dipti Misra Sharma and Parameswari Krishnamurthy},
        year={2024},
        eprint={2405.04829},
        archivePrefix={arXiv},
        primaryClass={cs.CL}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Datasets		Datasets
Hindi-NER-Prediction		Hindi-NER-Prediction
Multilingual-NER-Prediction		Multilingual-NER-Prediction
Odia-NER-Prediction		Odia-NER-Prediction
Telugu-NER-Prediction		Telugu-NER-Prediction
Urdu-NER-Prediction		Urdu-NER-Prediction
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IL-NER

About

Releases

Packages

Languages

SankalpBahad/IL-NER

Folders and files

Latest commit

History

Repository files navigation

IL-NER

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages