Dear StackOverflow Community,
I want to do sentiment analysis on 2 datasets of tweets, one with 9k strings and one with 30k strings. I have imported GermanSentiment and it ran just fine with the demo code from GitHub, but when I applied it to the 30k or 9k set, it caused my CPU to spike so fast that I had to put my laptop into standby and then kill the command line to regain control.
Believing it to be an issue of computing power, I tried it on my tower PC with more CPU+GPU power, it also causes CPU to spike to 100 % and gives me
Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ..torchcsrcutilstensor_numpy.cpp:84.)
despite NumPy being installed.
Is there some limit to GermanSentiment? Am I doing something wrong?
Added my code snippets as best I could. On my PC, I also get:
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.
Code:
from germansentiment import SentimentModel
import json
from tqdm import tqdm
#import csv
import re
import argparse
import xmltodict
import mdb_reader
import csvReader
import jsonReader
from pathlib import Path
def check_sentiment(tweet_set_1, tweet_set_2, sm_model):
model = sm_model
for tweet_set in tqdm([tweet_set_2]):
texts = [tweet_set[tweet]["text"] for tweet in tweet_set]
#breakpoint()
result = model.predict_sentiment(tqdm(texts))
print(result)
### Sentiment Analyse ###
model = SentimentModel()
#texts = [
#"Mit keinem guten Ergebniss","Das ist gar nicht mal so gut",
#"Total awesome!","nicht so schlecht wie erwartet",
#"Der Test verlief positiv.","Sie fährt ein grünes Auto."]
#result = model.predict_sentiment(texts)
#print(result)
check_sentiment(tweet_poli, tweet_other, model)