I have a simple ML classification problem. I have 8 folder each one represent class so I have first load these images from folders and assign labels and then save it as csv file (code in below)
def load_images_from_folder(root_folder):`
image_paths = []
images = []
labels = []
for label in os.listdir(root_folder):
label_path = os.path.join(root_folder, label)
if os.path.isdir(label_path):
for filename in os.listdir(label_path):
img_path = os.path.join(label_path, filename)
if os.path.isfile(img_path) and (filename.endswith(".jpg"):
img = Image.open(img_path)
img = img.resize((128, 128))
img_array = np.array(img)
image_paths.append(img_path)
images.append(img_array)
labels.append(label)
return image_paths, images, labels
if __name__ == "__main__":
root_folder_path = "./Datasets_1"
image_paths, images, labels = load_images_from_folder(root_folder_path)
I then convert images and labels to DataFrame and load it
data = {"Images": image_paths, "Labels": labels}
df = pd.DataFrame(data)
df.to_csv("original_data.csv", index=False)
csv_file = "original_data.csv"
df = pd.read_csv(csv_file)
I’m also add a new column ‘Encoded_Labels’ to the DataFrame with the encoded labels and convert ‘Encoded_Labels’ column to integers
df['Encoded_Labels'] = encoded_labels
df['Encoded_Labels'] = df['Encoded_Labels'].astype(int)
Finally I have split the dataset into training and testing sets and preprocess images for training
train_df, test_df = train_test_split(df, test_size=0.2, random_state=42)
def load_and_preprocess_images(file_paths, target_size=(128, 128)):
images = []
for file_path in file_paths:
img = Image.open(file_path)
img = img.resize(target_size)
img_array = np.array(img) / 255.0 # Normalize pixel values
images.append(img_array)
return np.array(images)
X_train = load_and_preprocess_images(train_df['Images'].values)
y_train = train_df['Encoded_Labels'].values
X_test = load_and_preprocess_images(test_df['Images'].values)
y_test = test_df['Encoded_Labels'].values**your text**
And the output shape of X_train is
(20624, 128, 128, 3)`
For this point I have no problem and I can use it with DL models with no problem but when try to use ML models such as KNN, SVM, DT, etc. For examples codes in below
from sklearn.svm import SVC
svc = SVC(kernel='linear',gamma='auto')
svc.fit(X_train, y_train)`
or
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)
y_pred = knn_clf.predict(X_test)
accuracy = metrics.accuracy_score(y_test, y_pred)
print("Accuracy of KNN Classifier : %.2f" % (accuracy*100))
I got this error
“ValueError: Found array with dim 4. SVC expected <= 2.”
How to fix this error??
Train model using ML