I am assigned this task where I have a dataset with several columns of the form .. .
` D1_16 D1_17 D1_18 D1_19 D1_20 D1_23 D1_24 D1_25 D1_26 D1_27
1 1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 0 1 0 1 1
3 0 0 0 0 0 0 0 0 0 0
4 0 1 0 1 0 0 1 1 1 0
5 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ...
29502 0 0 0 0 0 0 0 0 0 0
29504 0 0 0 0 0 0 0 0 0 0
29505 0 0 0 0 0 0 0 0 0 0
29506 0 0 0 0 0 0 0 0 0 0
29507 0 0 0 0 0 0 0 0 0 0
... D68_29 D68_30 D68_31 D68_32 D68_33 D68_34 D68_35 D68_36
1 ... 0 0 1 0 0 0 0 0
2 ... 0 0 0 0 0 1 0 0
3 ... 0 0 0 0 0 0 0 0
4 ... 0 0 0 0 0 0 0 0
5 ... 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ...
29502 ... 0 0 0 0 0 0 0 0
29504 ... 0 0 0 0 0 0 0 0
29505 ... 0 0 0 0 0 0 0 0
29506 ... 0 0 0 0 0 0 0 0
29507 ... 0 0 0 0 0 0 0 1 `
Task assignment:
The classification took place according to the class, which was definitely unsupervised (there are 5 classes for help).
Do I understand correctly that it is first necessary to do clustering, for example according to k-means and choose a suitable class on the basis of which I will divide the data into 2 parts and mark “1” and “0” and then perform classification on trees, for example? Or you need to perform clustering differently. I honestly don’t really understand what is being asked of me, I will be grateful for any idea.
My program:
best_score = -1
best_k = 0
for k in range(2, 10):
kmeans = KMeans(n_clusters=k)
kmeans.fit(df)
silhouette_avg = silhouette_score(df, kmeans.labels_)
if silhouette_avg > best_score:
best_score = silhouette_avg
best_k = k
kmeans = KMeans(n_clusters=best_k)
kmeans.fit(df)
df['cluster'] = kmeans.labels_
best_cluster = np.argmax(np.bincount(kmeans.labels_))
df['target'] = np.where(df['cluster'] == best_cluster, 0, 1)
X_train, X_test, y_train, y_test = train_test_split(df.drop(['cluster', 'target'], axis=1), df['target'], test_size=0.2, random_state=42)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
Dataset: https://filetransfer.io/data-package/Aiawe648#link
My all program: https://onecompiler.com/python/42cxfy2vz
Advice on clustering and classification
user24995323 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.