I’m currently working on a project using spaCy with the German trained pipeline de_dep_news_trf
.
Unfortunately, I’m having issues with named entity recognition (NER).
When I run a simple sentence like “Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin.”, no entities are detected.
I’ve followed these steps to set up my Python environment (3.12)(Windows) in a PyCharm Community project:
python.exe -m pip install --upgrade pip
pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download de_dep_news_trf --timeout 600
pip install spacy[transformers]
Here is a snippet of my code:
import spacy
def process_text_with_spacy(text_to_process):
doc = nlp(text_to_process)
data = {
"text": text_to_process,
"sentences": []
}
for sent in doc.sents:
process_sentence_data = {
"sentence": sent.text,
"entities": []
}
for ent in sent.ents:
process_sentence_data["entities"].append({
"text": ent.text,
"start": ent.start_char,
"end": ent.end_char,
"label": ent.label_
})
data["sentences"].append(process_sentence_data)
return data
nlp = spacy.load('de_dep_news_trf')
sample_text = "Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin."
processed_data = process_text_with_spacy(sample_text)
print("Text:", sample_text)
for sentence_data in processed_data["sentences"]:
print("Sentence:", sentence_data["sentence"])
print("Entities:", sentence_data["entities"])
Output:
Text: Berlin ist die Hauptstadt von Deutschland. Angela Merkel war die Bundeskanzlerin.
Sentence: Berlin ist die Hauptstadt von Deutschland.
Entities: []
Sentence: Angela Merkel war die Bundeskanzlerin.
Entities: []
Mehrer Compression GmbH – IT is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.