I have data that resembles this:
<code>import pandas as pd
import random
random.seed(901)
rand_list1= []
rand_list2= []
rand_list3= []
rand_list4= []
rand_list5= []
for i in range(20):
x = random.randint(80,1000)
rand_list1.append(x/100)
y1 = random.randint(-200,200)
rand_list2.append(y1/10)
y2 = random.randint(-200,200)
rand_list3.append(y2/10)
y3 = random.randint(-200,200)
rand_list4.append(y3/10)
y4 = random.randint(-200,200)
rand_list5.append(y4/10)
df = pd.DataFrame({'Rainfall Recorded':rand_list1, 'TAXI A':rand_list2, 'TAXI B':rand_list3, 'TAXI C':rand_list4, 'TAXI D':rand_list5})
df.head()
Rainfall Recorded TAXI A TAXI B TAXI C TAXI D
0 5.21 13.7 -5.0 -14.2 9.8
1 2.39 -0.3 18.8 4.8 -6.4
2 8.09 15.0 -3.6 18.6 12.7
3 5.79 -0.2 14.6 0.9 3.8
4 7.48 10.9 9.0 15.4 -16.5
</code>
<code>import pandas as pd
import random
random.seed(901)
rand_list1= []
rand_list2= []
rand_list3= []
rand_list4= []
rand_list5= []
for i in range(20):
x = random.randint(80,1000)
rand_list1.append(x/100)
y1 = random.randint(-200,200)
rand_list2.append(y1/10)
y2 = random.randint(-200,200)
rand_list3.append(y2/10)
y3 = random.randint(-200,200)
rand_list4.append(y3/10)
y4 = random.randint(-200,200)
rand_list5.append(y4/10)
df = pd.DataFrame({'Rainfall Recorded':rand_list1, 'TAXI A':rand_list2, 'TAXI B':rand_list3, 'TAXI C':rand_list4, 'TAXI D':rand_list5})
df.head()
Rainfall Recorded TAXI A TAXI B TAXI C TAXI D
0 5.21 13.7 -5.0 -14.2 9.8
1 2.39 -0.3 18.8 4.8 -6.4
2 8.09 15.0 -3.6 18.6 12.7
3 5.79 -0.2 14.6 0.9 3.8
4 7.48 10.9 9.0 15.4 -16.5
</code>
import pandas as pd
import random
random.seed(901)
rand_list1= []
rand_list2= []
rand_list3= []
rand_list4= []
rand_list5= []
for i in range(20):
x = random.randint(80,1000)
rand_list1.append(x/100)
y1 = random.randint(-200,200)
rand_list2.append(y1/10)
y2 = random.randint(-200,200)
rand_list3.append(y2/10)
y3 = random.randint(-200,200)
rand_list4.append(y3/10)
y4 = random.randint(-200,200)
rand_list5.append(y4/10)
df = pd.DataFrame({'Rainfall Recorded':rand_list1, 'TAXI A':rand_list2, 'TAXI B':rand_list3, 'TAXI C':rand_list4, 'TAXI D':rand_list5})
df.head()
Rainfall Recorded TAXI A TAXI B TAXI C TAXI D
0 5.21 13.7 -5.0 -14.2 9.8
1 2.39 -0.3 18.8 4.8 -6.4
2 8.09 15.0 -3.6 18.6 12.7
3 5.79 -0.2 14.6 0.9 3.8
4 7.48 10.9 9.0 15.4 -16.5
Given the Rainfall recorded in our region in centimeters, these are the % change in earnings reported by TAXI drivers surveyed. Can I use K MEANS CLUSTERING
to determine whether the TAXIS operated in our locality or not? Suppose there is relationship between Rainfall recorded and the Earnings change.
I have simple code got from web source:
<code>km = KMeans(n_clusters=2)
y_predicted = km.fit_predict(df[['TAXI','Rainfall Recorded']])
y_predicted
</code>
<code>km = KMeans(n_clusters=2)
y_predicted = km.fit_predict(df[['TAXI','Rainfall Recorded']])
y_predicted
</code>
km = KMeans(n_clusters=2)
y_predicted = km.fit_predict(df[['TAXI','Rainfall Recorded']])
y_predicted
But I am unsure what transformations need to be done before using this code.