I have a sample() from a large dataset to be used to calculate blob.polarity, but I’m getting the ValueError: [E088] Text of length 2044880 exceeds maximum of 1000000 when I run the code from a sample of it:
Getting the Sample
example = sentiment_analysis_df['Cleaned_Comments'].sample(10)
example
Output:
19379 love reading paperwhite battery lasts weeks li…
33685 old fire tv better updates
29863 read bit confusing echo control low behold ale…
24675 live ease use recommended anybody amazon fan g…
2395 great buy price great games
2345 recently upgraded kindle huge improvements spe…
19246 easy use great traveling device makes purchasi…
22389 worth penny amazing functions alexa loud crisp
24158 love playing music radio kids ask questions en…
10266 great price tablet plus great success kindle
Name: Cleaned_Comments, dtype: object
Converting to String (otherwise I get the ValueError: [E1041] Expected a string, Doc, or bytes as input, but got: <class ‘pandas.core.series.Series’>)
example = sentiment_analysis_df['Cleaned_Comments'].to_string()
Using the polarity attribute
doc = nlp(example)
polarity = doc._.blob.polarity
polarity
ValueError: [E088] Text of length 2044880 exceeds maximum of 1000000. The parser and NER models require roughly 1GB of temporary memory per 100,000 characters in the input. This means long texts may cause memory allocation errors. If you’re not using the parser or NER, it’s probably safe to increase the nlp.max_length
limit. The limit is in number of characters, so you can check whether your inputs are too long by checking len(text)
.
I’d appreciate any help. Thank you.
Convert <class ‘pandas.core.series.Series’> to string.
example = sentiment_analysis_df[‘Cleaned_Comments’].to_string()