I add a custom pipeline component after tagger to Spacy model. It does not receive pos_ and tag_ information.
Here the code:
nlp = spacy.load("en_core_web_trf")
@Language.component("segments")
def set_segments(doc):
for token in doc[:-1]:
print(token.i,token.text,token.pos_)
next_token = doc[token.i + 1]
if token.text == ":":
next_token.is_sent_start = False
elif token.text == ".":
next_token.is_sent_start = True
elif token.pos_ == "PROPN" and next_token.text == "[":
next_token.is_sent_start = True
return doc
nlp.add_pipe("segments", after="tagger")
print(nlp.pipe_names)
The pipe_names = [‘transformer’, ‘tagger’, ‘segments’, ‘parser’, ‘attribute_ruler’, ‘lemmatizer’, ‘ner’]
So this component comes after tagger. But in this line: print(token.i,token.text,token.pos_) token.i and token.text are correct but tok.pos = 0, pos_ is an empty string.
After this setup, I just run doc = nlp(“some input”);
I’ve tried also with en_core_web_sm, same.