I’m trying to match locations to institution names derived from published papers. SpaCey seems to inconsistently identify a country from the name ‘Amsterdam’. Sometimes it finds The Netherlands, other times it does not.
If you run the code you will see that in the second instance it works, but in the second it prints nothing.
import spacy
nlp = spacy.load("en_core_web_sm")
text = "Department of Rheumatology, Amsterdam Rheumatology & Immunology Center."
doc = nlp(text)
print("nLocation associations found in the text:")
for ent in doc.ents:
if ent.label_ == "GPE": # GPE: Geopolitical Entity
print(f"- {ent.text}")
text = "Department of Pulmonary Medicine and Amsterdam Cardiovascular Sciences, Amsterdam UMC, Vrije Universiteit, Amsterdam, The Netherlands."
doc = nlp(text)
print("nLocation associations found in the text:")
for ent in doc.ents:
if ent.label_ == "GPE": # GPE: Geopolitical Entity
print(f"- {ent.text}")
I am interested to know why it might be doing this. I see I do not understand what is going on behind the curtain.