In Spacy, when we request the nouns, the grammatical articles (ex.: “the”, “one”, “a”) are also presented
<code>import spacy
nlp_en = spacy.load('en_core_web_sm') # v3.7.1
doc = nlp_en('The man has cars, houses and one dog')
nouns = [chunk.text for chunk in doc.noun_chunks]
print(nouns) # ['The man', 'cars', 'houses', 'one dog']
</code>
<code>import spacy
nlp_en = spacy.load('en_core_web_sm') # v3.7.1
doc = nlp_en('The man has cars, houses and one dog')
nouns = [chunk.text for chunk in doc.noun_chunks]
print(nouns) # ['The man', 'cars', 'houses', 'one dog']
</code>
import spacy
nlp_en = spacy.load('en_core_web_sm') # v3.7.1
doc = nlp_en('The man has cars, houses and one dog')
nouns = [chunk.text for chunk in doc.noun_chunks]
print(nouns) # ['The man', 'cars', 'houses', 'one dog']
Is there a way to get ['man', 'cars', 'houses', 'dog']
?
It should work for every language, thus just stripping words “a la carte” is not a solution.