Conducting data analysis in MATLAB using the kmeans algorithm like so:
clear
clc
close all
%----------------------------
x1 = 1:0.01:3;
r = -1 + (1+1)*rand(1,201);
x1 = x1 + r;
y1 = 1:0.01:3;
r = -1 + (1+1)*rand(1,201);
y1 = y1 + r;
x1 = x1';
y1 = y1';
label1 = zeros(length(y1),1);
Tclust1 = [label1,x1,y1];
%----------------------------
x2 = 4:0.01:6;
r = -1 + (1+1)*rand(1,201);
x2 = x2 + r;
y2 = 10:0.01:12;
r = -1 + (1+1)*rand(1,201);
y2 = y2 + r;
x2 = x2';
y2 = y2';
label2 = ones(length(y2),1);
Tclust2 = [label2,x2,y2];
%----------------------------
x3 = 8:0.01:10;
r = -1 + (1+1)*rand(1,201);
x3 = x3 + r;
y3 = 12:0.01:14;
r = -1 + (1+1)*rand(1,201);
y3 = y3 + r;
x3 = x3';
y3 = y3';
label3 = label2+1;
Tclust3 = [label3,x3,y3];
%----------------------------
T = [Tclust1;Tclust2;Tclust3];
scatter(x1,y1)
hold on
scatter(x2,y2)
scatter(x3,y3)
title('Test data')
xlabel('x')
ylabel('y')
legend('Data 1','Data 2','Data 3','location','eastoutside')
%------------------------------
data = T(:,2:3);
labels = T(:,1);
xmin = min(T(:,2));
xmax = max(T(:,2));
ymin = min(T(:,3));
ymax = max(T(:,3));
[idx3,C,sumdist3] = kmeans(data,3);
plot(C(:,1),C(:,2),'kx','MarkerSize',15,'LineWidth',3)
Whilst the kmeans algorithm finds the centers, it doesn’t produce a model in the same was as the linear regression modelling and linear discriminant analysis functions do, which I can then use the “predict” function to validate. How do I make this type of output from the information the k-means algorithm gives me?
Intuitively this is quite simple, as we have the centres and distances to the centers, so we can classify data points as in a group if it’s less than 3 standard deviations from a center? But how do I actually code that?
https://au.mathworks.com/help/stats/k-means-clustering.html
https://au.mathworks.com/help/stats/discriminant-analysis.html
https://au.mathworks.com/help/stats/linearmodel.predict.html