I am working on a problem which, from what I can tell from prior research, is troubling many people in the industry I work in. In civic engineering, particularly fibre-optic cables, house connections (HCs) need to be grouped into clusters hooked up to network distributors (NDs). Intuitively, one might assume this to be a problem for simple clustering algorithms such as kmeans, however, the issue arises with the infrastructure-imposed limit of 20 HCs per ND. If HCs are too dense in one area, kmeans etc. construct clusters exceeding such limitations.
From what I can tell, the only commercially available solution to this is ArcGIS’ build balanced zones algorithm, however, for the sake of FOSS, I’d personally want to avoid using ArcGIS. However, thankfully, ESRI does provide background information on how bbs works, which is why I’ve been reading up on genetic algorithms and ended up with PyGAD by Gad, A.F.. His tutorial helped me out greatly, however, I am struggling with adapting the fitness function for my needs, i.e. punishing solutions resulting in clusters exceeding populations of 20.
The original function is as follows and scores solutions based on the density of the clusters it results in, i.e., it is equivalent to kmeans:
def fitness_func(solution, solution_idx):
cluster_centers, all_clusters_dists, cluster_indices, clusters, clusters_sum_dist = cluster_data(solution, solution_idx)
fitness = 1.0 / (np.sum(clusters_sum_dist) + 0.00000001)
return fitness
Best solution is [ 6.76857432 87.25666069 82.82788371 84.71676014 17.01672128 41.12036676
6.71771791 9.91987939 83.17725224 28.87028453 87.64197362 53.89675135
29.77371507 64.36472386 58.77172181 59.66950147 40.07732671 83.47739475
59.29050269 13.28871487]
Fitness of the best solution is 0.0004500494908839888
Best solution found after 100 generations
K-means-clusters generated according to Gad, A.F.
I thought I’d add a punishment term for exceeding cluster sizes of 20, however, the way I did it, seems to punish all chromosomes (I hope I’m using the term properly, I’m still a bit fresh to genetic algorithms) equally.
def capped_clusters(solution, solution_idx):
cluster_centers, all_clusters_dists, cluster_indices, clusters, clusters_sum_dist = cluster_data(solution, solution_idx)
fitness = 1.0 / (np.sum(clusters_sum_dist) + 0.00000001) #1 is the optimum
if any(element > 20 for element in [len(cluster) for cluster in clusters]):
fitness = 0
else:
pass
return fitness
Best solution is [66.16381384 94.42791405 34.78040731 64.66917082 90.50249179 24.68926304
7.54327271 66.90232116 55.22703763 88.59661014 1.87161791 93.35067003
39.58200881 57.61364249 65.91865205 16.62963165 4.22783108 67.18573653
15.57759318 9.46596451]
Fitness of the best solution is 0
Best solution found after 0 generations
How can I selectively punish cluster sizes exceeding 20 to push the algorithm towards evolving in a way that favours sub-20-member solutions to the clustering problem?
Note: I have downgraded to PyGAD 2.10.0 for compatibility with the tutorial code from the link above. This required me to manually remove any mentions of the obsoltete numpy.int
and numpy.float
from the source code.
Cheers and thank you very much!
Thomas Hoffmann is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.