You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am trying to built a sense2vec model with new data. I have made few changes in 01_parse.py.
First, I have removed the default ner pipe coming with "en_core_web_lg".
Then I have added a new Language.component where I identify Spans associated to a new entities (new labels) in a doc.
Sometimes, I would like to assign a Span[x, y] to more than one entity but I can not.
My question...
I have read the new changes in spaCy v3.1. Is there a way to use "doc.spans" (or something similar) in 01_parse where SpaCy's internal algorithms take Spans overlap into account?
@Language.component("name_comp")
def my_component(doc):
matches = matcher(doc)
seen_tokens = set()
new_entities = []
entities = doc.ents
for match_id, start, end in matches:
# check for end - 1 here because boundaries are inclusive
if start not in seen_tokens and end - 1 not in seen_tokens:
new_entities.append(Span(doc, start, end, label=match_id))
entities = [
e for e in entities if not (e.start < end and e.end > start)
]
seen_tokens.update(range(start, end))
doc.ents = tuple(entities) + tuple(new_entities)
return doc
Thanks in advance,
Paula
The text was updated successfully, but these errors were encountered:
Hi,
I am trying to built a sense2vec model with new data. I have made few changes in 01_parse.py.
First, I have removed the default ner pipe coming with "en_core_web_lg".
Then I have added a new Language.component where I identify Spans associated to a new entities (new labels) in a doc.
Sometimes, I would like to assign a Span[x, y] to more than one entity but I can not.
My question...
I have read the new changes in spaCy v3.1. Is there a way to use "doc.spans" (or something similar) in 01_parse where SpaCy's internal algorithms take Spans overlap into account?
@Language.component("name_comp")
def my_component(doc):
matches = matcher(doc)
seen_tokens = set()
new_entities = []
entities = doc.ents
for match_id, start, end in matches:
# check for end - 1 here because boundaries are inclusive
if start not in seen_tokens and end - 1 not in seen_tokens:
new_entities.append(Span(doc, start, end, label=match_id))
entities = [
e for e in entities if not (e.start < end and e.end > start)
]
seen_tokens.update(range(start, end))
doc.ents = tuple(entities) + tuple(new_entities)
return doc
Thanks in advance,
Paula
The text was updated successfully, but these errors were encountered: