I’ve a set of string where I shall detetect the country its belongs to, referring to detected GPE.
sentences = [
"I watched TV in germany",
"Mediaset ITA canale 5",
"Je viens d'Italie",
"Ich komme aus Deutschland",
"He is from the UK",
"Soy de Inglaterra",
"Sono del Regno Unito"
]
So my expectation :
“I watched TV in germany” => DE
“Soy de Inglaterra” => UK
etc..
I’ve tried with this code:
from spacy.pipeline import EntityRuler
nlp = spacy.load('xx_ent_wiki_sm') # Multilingual model
def detect_country(text):
doc = nlp(text)
countries = []
for ent in doc.ents:
if ent.label_ == 'GPE':
countries.append(ent.text)
return countries
for sentence in sentences:
countries = detect_country(sentence)
print(f"Countries detected in '{sentence}': {countries}")
But results are completely empty:
Countries detected in 'I watched TV in germany': []
Countries detected in 'Mediaset ITA canale 5': []
Countries detected in 'Je viens d'Italie': []
Countries detected in 'Ich komme aus Deutschland': []
Countries detected in 'He is from the UK': []
Countries detected in 'Soy de Inglaterra': []
Countries detected in 'Sono del Regno Unito': []
I don’t understand if I’m missing something into spacy pipeline and if I’m actually take good choice for multilang model.
Thx in advance for any advise