I’m trying to make an ANN in Python to predict something from a dataset (in this case diabetes), and I’m struggling to figure out how to solve this error.
Here is the full code:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, KFold, cross_val_score
from sklearn import preprocessing
from keras import Sequential
from keras.layers import Dense
from sklearn.metrics import confusion_matrix, accuracy_score
data = pd.read_csv('C:/Users/<<>>/Downloads/Dataset of Diabetes.csv')
# drop irrelevant columns
dropcols = ['ID', 'No_Pation']
data = data.drop(dropcols, axis=1)
data.info()
X = data.values
Y = data['CLASS'].values
label_encoder = preprocessing.LabelEncoder()
data['CLASS'] = label_encoder.fit_transform(data['CLASS'])
data['Gender'] = label_encoder.fit_transform(data['Gender'])
data['CLASS'].unique()
data['Gender'].unique()
data.info()
X = np.delete(X, 1, axis=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)
X_train = np.asarray(X_train).astype(np.float32)
Y_train = np.asarray(Y_train).astype(np.float32)
classifier = Sequential()
classifier.add(Dense(units=10, activation='relu', input_dim=X.shape[1]))
classifier.add(Dense(units=10, activation='relu'))
classifier.add(Dense(units=1, activation='sigmoid'))
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, Y_train, epochs=100, batch_size=10)
Y_pred = classifier.predict(X_test)
Y_pred_int = (Y_pred > 0.5).astype(int)
cm = confusion_matrix(Y_test, Y_pred_int)
acc = accuracy_score(Y_test, Y_pred_int)
print("Accuracy:", acc)
print("Confusion Matrix:n", cm)
This is what the last “data.info()” line returns:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Gender 1000 non-null int32
1 AGE 1000 non-null int64
2 Urea 1000 non-null float64
3 Cr 1000 non-null int64
4 HbA1c 1000 non-null float64
5 Chol 1000 non-null float64
6 TG 1000 non-null float64
7 HDL 1000 non-null float64
8 LDL 1000 non-null float64
9 VLDL 1000 non-null float64
10 BMI 1000 non-null float64
11 CLASS 1000 non-null int32
dtypes: float64(8), int32(2), int64(2)
memory usage: 86.1 KB
Here is the error message that I am getting:
Traceback (most recent call last):
File "C:Users<<>>PycharmProjectsAI2NeuralNetwork.py", line 32, in <module>
X_train = np.asarray(X_train).astype(np.float32)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: 'M'
Another error I just realised I have (more of warning) is:
UserWarning: Do not pass an input_shape
/input_dim
argument to a layer. When using Sequential models, prefer using an Input(shape)
object as the first layer in the model instead.
super().init(activity_regularizer=activity_regularizer, **kwargs)
What does this mean?
Also, I have been getting getting the recurring error of “ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
If there are also any other issues with what I have done so far, please let me know!
Many Thanks
Link to the dataset: https://data.mendeley.com/datasets/wj9rwkp9c2/1
I’ve already tried converting the X and Y trains to np arrays, but I’m not sure what else I need to do.
7