Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curation does not properly merge two identical annotations when they contain stacked spans #5226

Open
j-klesen opened this issue Jan 13, 2025 · 4 comments
Assignees
Labels
Support request User has a problem and needs help

Comments

@j-klesen
Copy link

Describe the bug
When using two annotators who annotate the exact same stacked spans and relations for a given sentence, the majority vote curation merging does not result in the correct output.

To Reproduce
Steps to reproduce the behavior:

  1. Download the attached project.
  2. Import
  3. Go into curation view
  4. Re-merge using parameters: user threshold: 1; top-voted:100; confidence: 0.0
  5. Check the result and especially the annotations for sentence 9

Expected behavior
In the merged view, Sentence 9 should have the correctly merged annotations.

Screenshots
Annotator 1:
image

Annotator 2 (same):
image

Result of curation merge:
image

Please complete the following information:

  • Version and build ID: INCEpTION -- 35.0-SNAPSHOT (2025-01-10 17:49:47, build bd1305e)
  • Note: we have set this flag
  • OS: Linux 5.15.0-1066-aws amd64
  • Browser: Firefox

Additional context
merging_bug.zip

@reckart
Copy link
Member

reckart commented Jan 13, 2025

The merge algorithm is conservative when it comes to handling transitivity - i.e. it does not consider transitivity.

That means, for the purpose for merging the link [Treatment A]-(TLink:Overlap)->[TreatmentB], there are two possible targets, namely the two (Event:Treatment) annotations.
Because transitivity is not considered, the two (Event:Treatment) annotations are considered to be equal despite one having a (transitive) link and the other not having one.
It cannot decide which one to use as the target for the link, so it does not merge the link.

When setting the log-level of de.tudarmstadt.ukp.inception.curation.merge.CasMerge to TRACE, you get this explanation:

Processing [3] span positions on layer [custom.Span]
 |   processing Span [, coll=file:/export/repository/project/0/document/19/source/, doc=j.klesen.3, type=Span, span=(642-653)[Treatment A]]
 `-> merged span annotation [3746] (created) -> [3619]
 |   processing Span [, coll=file:/export/repository/project/0/document/19/source/, doc=j.klesen.3, type=Span, span=(658-669)[Treatment B]]
 `-> merged span annotation [3734] (created) -> [3625]
 `-> merged span annotation [3752] (created) -> [3631]
 |   processing Span [, coll=file:/export/repository/project/0/document/19/source/, doc=j.klesen.3, type=Span, span=(715-726)[Treatment C]]
 `-> merged span annotation [3728] (created) -> [3637]
Processing 2 link positions on layer [custom.Span]
 |   processing Span [, coll=file:/export/repository/project/0/document/19/source/, doc=j.klesen.3, type=Span, linkFeature=RelationSemantics, linkTarget=(658-669)
 `-> not merged link annotation [3746]: There are multiple possible targets. Cannot merge this link.
 |   processing Span [, coll=file:/export/repository/project/0/document/19/source/, doc=j.klesen.3, type=Span, linkFeature=RelationSemantics, linkTarget=(715-726)
 `-> merged link annotation [3752] -> [3625]
Merge complete. Created:  4 Updated: 0

@reckart
Copy link
Member

reckart commented Jan 14, 2025

I believe the stacked [Treatment A]-(TLink:Overlap)->[Treatment B] is redundant.

If I read this correctly, you are saying that there are two options:

  1. only Treatment A and Treatment B are in one equivalence class
  2. Treatment A, Treatment B, and Treatment C are all in one equivalence class

If that is the case, then I would want to have agreement on the [Treatment A]-(TLink:Overlap)->[Treatment B] part and disagreement on Treatment C and the link to it.

But I think that would require a change to the annotation guidelines asking the annotators to only annotate the most probably tuple instead of stacking alternatives.

@reckart reckart added the Support request User has a problem and needs help label Jan 14, 2025
@reckart reckart self-assigned this Jan 14, 2025
@reckart reckart added this to Support Jan 14, 2025
@github-project-automation github-project-automation bot moved this to 🤷 To do in Support Jan 14, 2025
@j-klesen
Copy link
Author

Hi @reckart,

Thanks for getting back to us!

[...] asking the annotators to only annotate the most probably tuple instead of stacking alternatives.

Its not that the annotators here annotated two options, where only one of them is true. Instead, both of them are true: the annotation says "treatments A&B were given in combination, and also, treatments A&B&C were given in combination".

The merged curation should ideally reflect that and keep all annotations. Especially since in this case, both annotators agreed completely.

I hope that clears things up!

@reckart
Copy link
Member

reckart commented Jan 14, 2025

I think it would require significant changes to the merge algorithm to handle such a case and turn it into a subgraph-matching problem.

An option you have with the current implementation would be introducing a disambiguation feature into the Event annotation, e.g. "group" where the annotators could enter "1" or "2".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Support request User has a problem and needs help
Projects
Status: 🤷 To do
Development

No branches or pull requests

2 participants