-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batch analyze API support for recognizer #1506
Comments
Thanks for the suggestion! One of the reasons we chose to use spacy-huggingface-pipelines is its batch support for transformer based models. Have you experimented with this option? In general, I agree that exposing an |
@omri374 thanks for the quick reply. The problem is that we are currently using two transformer models (and not for the purpose of multi-language support), so that approach wouldn’t work. Another way forward for us would be to fine-tune our own models and consolidate to one. |
Alternatively, we could support multiple NLP engines and extend the NLP artifacts. If we decide to provide better support for batching, my team is happy to help with the PR. |
Will the option of having a |
@omri374 The issue is that I’ll try cut a PR to add the batch analyzer API to the EntityRecognizers. and In our codebase, we’ll create:
|
In the
Would be replaced with a call to I must say I haven't thought about this deeply enough to know if it's viable. |
Is your feature request related to a problem? Please describe.
Currently, the
BatchAnalyzerEngine
works by iterating through either a list or dictionary, analyzing and anonymizing the values one by one. while this is not an issue for the predefined recognizers, and there are improvements in the built-in NLP engine to support batch inference, it does pose an efficiency problem for the transformer recognizers, causing idle resources and low inference throughput.Describe the solution you'd like
We want to build batch inference API support for recognizers. Early testing shows that even with a small batch size of 4, a BERT-like transformer speeds up inferences by 3x without any additional resource or memory usage.
The exact implementation is still up for discussion and one potential solution would be adding a batch recognizer mix-in, where we batch analyze the the batch recognizer first, and pass the results to the regular analyze for extension. similar to the nlp_engine.process_batch
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: