im using this function to do sentence stemming
from nltk.stem import WordNetLemmatizer, PorterStemmer
from nltk.tokenize import word_tokenize
import string
from nltk.corpus import stopwords
punctuation = set(string.punctuation)
english_stopwords = set(stopwords.words('english'))
porter_stemmer = PorterStemmer()
def clean_text(text):
text = text.lower()
tokens = word_tokenize(text)
cleaned_tokens = []
cleaned_tokens = [token for token in tokens if token not in english_stopwords]
cleaned_tokens = [token for token in tokens if token not in punctuation]
cleaned_tokens = [token for token in tokens if token.isalnum()]
cleaned_tokens = [porter_stemmer.stem(token) for token in cleaned_tokens if len(token) > 0]
return ' '.join(cleaned_tokens)
and im just run it at my csv
import pandas as pd
currData = pd.read_csv(f'../Steam dataset/clean_steam_database(english)_133.csv')
currData['review'] = [clean_text(word) for word in currData['review']]
but it said :
“The Kernel crashed while executing code in the current cell or a previous cell.
Please review the code in the cell(s) to identify a possible cause of the failure.
Click here for more info.
View Jupyter log for further details.”
and
give this error:
“16:23:09.679 [error] Disposing session as kernel process died ExitCode: 3221225725, Reason:
16:23:09.706 [info] Cell 2 completed in -1716369788.31s (start: 1716369788310, end: undefined)”
what is it possibly be?
i actualy accuire this dataset from :
https://www.kaggle.com/datasets/najzeko/steam-reviews-2021?resource=download
and im running this code on that dataset that i’ve been split into 1000 parts.
i’ve print everytime i did the function to see which index is the problem but, when i ran the function directly at that index it has no problem.
Frederick is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.