I have an excel file (input.xlsx) that contains two columns (id and url).
I performed webscrapping on all the url and performed text analysis on the texts.
I have functions that calculates the positive score, negative score, polarity etc.
I want to create an output file(output.xlsx) that will contain all the above results but my script is printing same output in all the rows but it is printing the correct output inside the function.
Example:
Columns: Id, url, positive score, negative score, polarity etc
Rows: Rows will contain the output of each function.
Expected Output:
Positive Score(column) : 23, 70, 43, 35 (rows)
Actual Output:
Positive Score(column) : 35, 35, 35, 35 (rows)
MY FUNCTIONS:
#CALCULATING POSITIVE SCORES
# Cleaned texts
os.getcwd()
new_texts_folder = os.path.join(os.getcwd(), 'new_texts')
for root, folders, files in os.walk(new_texts_folder):
for file in files:
path = os.path.join(root, file)
with codecs.open(path, encoding='utf-8', errors='ignore') as info:
new_content = eval(info.read()) # Convert string to list
def positive_score(content):
#tokens = tokenz(text)
pos_score = 0
for token in content:
if token in filtered_positive_dictionary:
pos_score += 1
return pos_score
#positive_result = positive_score(new_content)
The above codes prints the correct outputs only when you print it inside the function. It prints only one output outside the function.
My Excel Function:
data_collection = {
'URL_ID': url_ids, #(this is working as expected)
'URL': urls, #(this is working as expected)
'POSITIVE SCORE': positive_score(new_content) #(this is not working as expected)
}
excel_data_df = pd.DataFrame(data_collection)
excel_data_df.to_excel("Outputput.xlsx", index = False)