I have a dataframe with county-level socioeconomic variables of the USA. Additionally, I have a column with the corresponding Coefficients of Variation (CV) for each variable.
County | Var1 | CV Var1 | Var2 | CV Var2 |
---|---|---|---|---|
A | 3 | 24 | 45 | 21 |
B | 6 | 18 | 34 | 18 |
I want to cluster the counties based on their socioeconomic conditions. As the data quality varies in different counties (this is measured by the CV), I thought it would be interesting, to give more more weight to those counties which have a lower CV (meaning a higher data quality) in the clustering process.
I tried using sk-learns weighted k-means clustering. Similar to this example:
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
data = {
"Variable1": [10, 20, 30, 40, 50],
"Variable2": [5, 15, 25, 35, 45],
"CV_Variable1": [0.1, 0.2, 0.1, 0.3, 0.2], # Coefficients of Variation
"CV_Variable2": [0.05, 0.15, 0.05, 0.25, 0.15] # Coefficients of Variation
}
# Create DataFrame
df = pd.DataFrame(data)
# Extract variables and weights
variables = ["Variable1", "Variable2"]
weights = df[["CV_Variable1", "CV_Variable2"]].values
# Normalize the data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df[variables])
# Perform K-Means clustering
kmeans = KMeans(n_clusters=2, random_state=0, max_iter=1000)
kmeans.fit(X_scaled, sample_weight=weights)
df['Cluster'] = kmeans.labels_
# Print cluster centroids
print("Cluster Centroids:")
print(kmeans.cluster_centers_)
# Plot the K-Means results
plt.scatter(df['Variable1'], df['Variable2'], c=df['Cluster'], cmap='viridis')
plt.xlabel('Variable1')
plt.ylabel('Variable2')
plt.title('K-Means Clustering of Toy Data')
plt.colorbar(label='Cluster')
plt.show()
But I get ValueError: Sample weights must be 1D array or scalar.
Is there any work-around or other Clustering algorithm which makes it work to assign a weight to each variable value for each county?
I’m happy for any advise or guidance. Thanks!