I am trying to train KMeans model using Scikit-Learn.
I am stuck on this issue for 2 days.
Pandas is selecting all columns of a dataframe even though I specified 2 columns.
Here is the code:
cols=['area', 'perimeter', 'compactness', 'length', 'width', 'asymmetry', 'groove', 'class']
x = 'perimeter'
y = 'asymmetry'
z = df[[x, y]].values
kmeans = KMeans(n_clusters=3).fit(z)
clusters = kmeans.labels_
print(clusters)
cluster_df = pd.DataFrame(np.hstack((z, clusters.reshape(-1, 1))), columns=[x, y, "class"])
sns.scatterplot(x=x, y=y, hue='class', data=cluster_df)
plt.show()
sns.scatterplot(x=x, y=y, hue='class', data=df)
plt.show()
This is the plot of the original dataset
This is the predicted plot
I was expecting the predicted plot to look like the original dataset plot.
The predicted plot looks like sns plotted all the different columns together.
PS : I am a beginner so I might not know stuff so please don’t dislike.
Shree_ML is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.