I am performing a nlp task. I have written the following code. While executing, it is showing the following error. Any suggestion to resolve the error will be helpful. I am having python 3 env in google colab .
# Pytextrank
import pytextrank
import json
# Sample text
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
# Create dictionary to feed into json file
file_dic = {"id" : 0,"text" : sample_text}
file_dic = json.dumps(file_dic)
loaded_file_dic = json.loads(file_dic)
# Create test.json and feed file_dic into it.
with open('test.json', 'w') as outfile:
json.dump(loaded_file_dic, outfile)
path_stage0 = "test.json"
path_stage1 = "o1.json"
# Extract keyword using pytextrank
with open(path_stage1, 'w') as f:
for graf in pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
f.write("%sn" % pytextrank.pretty_print(graf._asdict()))
print(pytextrank.pretty_print(graf._asdict()))
I am getting the following error :
AttributeError Traceback (most recent call last)
<ipython-input-33-286ce104df34> in <module>()
20 # Extract keyword using pytextrank
21 with open(path_stage1, 'w') as f:
---> 22 for graf in
pytextrank.parse_doc(pytextrank.json_iter(path_stage0)):
23 f.write("%sn" % pytextrank.pretty_print(graf._asdict()))
24 print(pytextrank.pretty_print(graf._asdict()))
AttributeError: module 'pytextrank' has no attribute 'parse_doc'
6
Implementation of TextRank in Python for use in spaCy pipelines
import spacy
import pytextrank
nlp = spacy.load('en_core_web_sm')
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name='textrank', last=True)
# Sample text
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
#funct
for p in doc._.phrases:
print(p.text)
1
AttributeError: module 'pytextrank' has no attribute 'TextRank'
reproduce err:
run:
def summarize_text_returns_expected_summary(nlp, text):
doc = process_text(nlp, text)
if 'textrank' not in nlp.pipe_names:
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
doc = nlp(text)
return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]
error:
AttributeError: module 'pytextrank' has no attribute 'TextRank'
fix:
step_1
check pytextrank
installation
pip list | grep pytextrank
step_2
replace:
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
with:
nlp.add_pipe("textrank")
updated code:
def summarize_text_returns_expected_summary(nlp, text):
doc = process_text(nlp, text)
if 'textrank' not in nlp.pipe_names:
nlp.add_pipe("textrank")
doc = nlp(text)
return [str(sent) for sent in doc._.textrank.summary(limit_phrases=15, limit_sentences=5)]
omitting the
if
statement, risks encountering errors when accessingtextrank
: the script won’t check iftextrank
is present in the pipeline.
why?
spacy pipeline: sequence of processing steps (tokenization, POS tagging, NER).
incorrect code manually uses pytextrank.TextRank()
, then attempts to add it to the pipeline.
tr = pytextrank.TextRank()
nlp.add_pipe(tr.PipelineComponent, name="textrank", last=True)
correct code:
nlp.add_pipe("textrank")
auto adds textrank
component correctly, ensuring proper registration and accessibility.
adding
TextRank
to the spacy pipeline registers its methods, attributes, and allows access via._
on documents (e.g.,doc._.textrank.summary()
).
notes on module 'pytextrank' has no attribute 'parse_doc
a parser is often a necessary component in NLP pipeline.
it can be added to the pipeline alongside PyTextRank.
since:
error msg indicates that the parse_doc
function is not found in the pytextrank
module. potentially, due to changes in the pytextrank library: some functions might have been removed; or simply, do not exist.
do instead:
load a spacy parser
, and add it to the pipeline along pytextrank
.
i.e. the spacy small english model en_core_web_sm
tokenizes the text before parsing it.
example:
import spacy
import pytextrank
import json
def get_top_ranked_phrases(text):
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")
doc = nlp(text)
top_phrases = []
for phrase in doc._.phrases:
top_phrases.append({
"text": phrase.text,
"rank": phrase.rank,
"count": phrase.count,
"chunks": phrase.chunks
})
return top_phrases
sample_text = 'I Like Flipkart. He likes Amazone. she likes Snapdeal. Flipkart and amazone is on top of google search.'
top_phrases = get_top_ranked_phrases(sample_text)
for phrase in top_phrases:
print(phrase["text"], phrase["rank"], phrase["count"], phrase["chunks"])
output:
output_of_sample.py
code notes:
✔︎ load spacy small english model
✔︎ add pytextrank to pipeline
✔︎ store the top-ranked phrases
✔︎ examine the top-ranked phrases in the document
✔︎ print the top-ranked phrases
references:
-DerwenAI
-(https://spacy.io/universe/project/spacy-pytextrank)
-textrank: bringing order into text
-keywords and sentence extraction with textrank (pytextrank)
-模块’pytextrank’没有属性’parse_doc’
-scattertext/issues/92
-AttributeError: module ‘pytextrank’ has no attribute ‘TextRank’ #2
There’s a newer release of PyTextRank which simplifies the calling code, and makes these steps unnecessary:
https://spacy.io/universe/project/spacy-pytextrank
2