i am importing a .csv data with pandas to my jupyter notebook
splitting it on 80/20 and then wanna train a logistic regression
but it always have errors with NaNs and cant handle it
this is everything working
import pandas as pd
df = pd.read_csv("file.csv", header = 0, sep = ";", encoding = "utf-8", decimal = ",",
dtype={
"MM_ERGEBNIS": int,
"ZEITRAUM": int,
......
}
)
df.head()
predictors = df.drop(columns=["MM_ERGEBNIS", "ZEITRAUM"])
target = df["MM_ERGEBNIS"]
names = predictors.columns
from sklearn.model_selection import train_test_split
predictors_train, predictors_test, target_train, target_test = train_test_split(predictors, target, test_size = 0.20, random_state = 0, shuffle = True, stratify = target)
predictors_train.shape, predictors_test.shape
but if i launch a training then it stops
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, precision_score, recall_score, f1_score
clf = LogisticRegression(solver = 'lbfgs')
clf.fit(predictors_train, target_train)
train_pred = clf.predict(predictors_train)
test_pred = clf.predict(predictors_test)
TypeError: float() argument must be a string or a real number, not ‘NAType’
i have about 250 rows as float (but a lot empty Values) and 4 Rows string (empty rows too)
and i dont know how to handy it that it can train always errors NaNs or if i replace them to ” with
df.fillna(”, inplace=True)
then it crashes too
and after the logreg i wanna do with the same data a RandomForestClassifier and a NN
can someone pls help me how i can handle the data ?
Thanks
tried everything i found here on stackoverflow
Matthias is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.