Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flow for entity refinement #4

Open
8 of 10 tasks
secretsauceai opened this issue Jul 2, 2022 · 1 comment
Open
8 of 10 tasks

Flow for entity refinement #4

secretsauceai opened this issue Jul 2, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@secretsauceai
Copy link
Owner

secretsauceai commented Jul 2, 2022

Description

We want users, in this case most likely myself and any developers whoever, to benchmark an NLU data set for entity extraction and be able to refine those entities to improve the data set.

Making sure the “human in the for loop” flow works comes from refining the entities, however there will be improvements that need to be made as they block the refinement process. This is a dummy ticket to append such minor code fixes to.

User stories

As a user, I want to

  • visually see the analytics of the entities so that I know what needs to be improved
  • review all the incorrect entities so that I can fix them

Sounds easy, right? Especially since we have already built the intent refinement work flow, but it is a bit more complex than that.

With intents; we could visualize all domains, see where the intents are doing the worst, pick those domains, and then review all the incorrectly classified intents in that domain for refinement. With entities, it is a bit more tricky.

We'll do this, but with entities. However, we need to group together the entities in a domain, and there will also be overlap. Some utterances have more than one entity type. So, we have to keep track of that. Furthermore, do we tell the user to refine all entities in an utterance, or do we tell them to ignore them? It would be super annoying to have to go back over the same utterances 2 or more times! This is why we should have users working on multiple entities at the same time. This is harder for a user to do, as the user must know if each one is correct and if not, what they should be.

Ergo, it is better for a user to review incorrect entries in batches. They should have an overview for that domain of example entries where the entities are correct, then go through correcting no more than 100 at a time.

This means, however, we will have to adapt our flow from the intent refinement. With intent refinement, we recorded into CSVs by domain and intent. Here we will just do it by domain in batches, then merge those together into one for the whole domain. If the user is lucky, they will only have to do one batch per domain.

DoD

  • benchmark entities over whole data set
  • graph analysis of entities for the whole data set
  • benchmark entities per domain
  • graphs of entities per domain
  • add incorrect_entities_report to macro_entities_refinement.py
  • ipysheet refinement of a batch in the domain
  • save to a CSV of batches
  • merge with CSV for the whole domain
  • merge with the CSV for the whole data set
  • benchmark again
@AmateurAcademic AmateurAcademic self-assigned this Aug 6, 2022
@AmateurAcademic AmateurAcademic added the enhancement New feature or request label Aug 6, 2022
@AmateurAcademic
Copy link
Collaborator

This code needs to be refactored and tested extensively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants