For context, I have a dataset with 50000 data and need to perform K-means clustering.
I was wondering which scaling I should perform on the dataset before clustering.
This is the question for further context:
Use the following sets of features to perform k-Means clustering with k = 5.
Set 1: GENDER, EDUCATION, AGE, MARRIGAE, CREDIT_LIMIT
Set 2: BALANCE(-2), MIN_PAY(-2), EXPENSE(-2), FAILURE_ATTEMPT(-2), PAY(-2), DEFAULT(-2)
Set 3: BALANCE(-1), MIN_PAY(-1), EXPENSE(-1), FAILURE_ATTEMPT(-1), PAY(-1), DEFAULT(-1)
Set 4: BALANCE(0), MIN_PAY(0), EXPENSE(0), FAILURE_ATTEMPT(0)
i) Explain any scaling method that you used and justify why.
ii) For each set, determine the number of samples and the proportion of positive cases in each cluster.
iii) Identify if the clustering results are useful to show any characteristics of the positive cases and justify your answer.
I have used Standardization and I am unsure if its accurate since using Normalization scaling gives vastly different results.
Irfan Adib is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.