I would like to split texts by using a list of key-phrases (such as “Chapter 1” or “Section 2,” defined using regular expression) and then, perform sentiment analysis for each splitted text.
Please find below my function for this analysis, but when using re.split like my function, extracted texts (“sections” here in my function) do not contain a corresponding “phrase,” and so, I cannot print what “phrase” was used in splitting that text. For example, I would like to print, like, “Chapter 1: 0.5” “Chapter 2: 0.2” “Section 1: 0.7,” but through this function, I cannot find which phrase is corresponding to which sentiment analysis result.
Could you let me know how to split texts with used key-phrases stored?
def analyze_sentiment(phrase, pdf_path):
# STEP1: Extract the entire texts of a PDF file
full_text = normalize_text(extract_text_from_pdf(pdf_path))
I already defined the function of “extract_text_from_pdf”
# STEP2: Split the text by a key-phrase
sections = re.split(r'b' + phrase + r'b', full_text)
“phrase,” containing several key-phrases (such as Chapter 1, Section 2, and so on), is already defined by using regular expression.
results = []
for section in sections:
sentiment_result = TextBlob(section).sentiment
results.append(sentiment_result.polarity)
return results
Best,
I tried by using “re.split,” but it cannot store information on corresponding “phrase”.
ppp is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.