Accuracy improvement for indian location names extraction using NER #9832
-
Hi, Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Sorry you're having trouble with this, you're correct that there probably aren't enough Indian location names in our training data. I would recommend you make a list of Indian location names and use an EntityRuler to label the data and see how much coverage that gets. If the coverage is reasonable, you can use that data as training data for an NER component. You can put that component in the pipeline with the existing NER component and see how that works. I suspect that putting it after with overwrite is the best thing to do, but you should try different combinations of before/after the default NER and using overwrite or not. See the double NER example project for notes on how that works. You can also try combining annotations from the default NER and your EntityRuler to train one NER component to replace the default NER component. That's simpler in some ways and has fewer computational requirements, but is more likely to run into accuracy issues, so I would definitely try the above approach first. There are two assumptions I'm making here:
If either of those are not true a different approach would be necessary, and we'd need more info about your data. |
Beta Was this translation helpful? Give feedback.
-
from spacy.pipeline import EntityRuler you can try this code. Also can add as many location you want.. |
Beta Was this translation helpful? Give feedback.
Sorry you're having trouble with this, you're correct that there probably aren't enough Indian location names in our training data.
I would recommend you make a list of Indian location names and use an EntityRuler to label the data and see how much coverage that gets. If the coverage is reasonable, you can use that data as training data for an NER component. You can put that component in the pipeline with the existing NER component and see how that works. I suspect that putting it after with overwrite is the best thing to do, but you should try different combinations of before/after the default NER and using overwrite or not. See the double NER example project for notes on how that works.
Y…