I have created a program that learns how to play connect4 using a neural network and genetic algorithms.
It starts with random weight neural networks, lets every network play against each of the other networks once, chooses the top winning neural networks (the ones that won the most games), breeds and mutates them and creates a new population for the next generation.
It doesn’t seem to learn how to play the game well. I thought that maybe my problem is that the fitness score is too much based on the other networks (the ‘stupidity’ of some of the networks). I mean lets take network A that plays against B,C,D. If B,C,D are really bad it gives a really good fitness score to network A. It also causes the fitness of my first generations to be quite high as if network X just puts always in the same column it wins most game against other ‘stupid’ network that place the pieces in multiple columns.
How would you tackle this problem? Am i in the right direction? Should i make another ‘real’ fitness score that will actually gradually improve?
Here is a simplified version of my key functions:
def create_model(row_count, column_count):
model = Sequential()
model.add(Flatten(input_shape=(row_count, column_count)))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(column_count, activation='softmax'))
return model
def crossover(model1, model2):
weights1 = model1.get_weights()
weights2 = model2.get_weights()
new_weights = [np.where(np.random.rand(*w1.shape) < 0.5, w1, w2) for w1, w2 in zip(weights1, weights2)]
offspring = create_model(ROW_COUNT, COLUMN_COUNT)
offspring.set_weights(new_weights)
return offspring
def mutate(model, mutation_rate):
weights = model.get_weights()
new_weights = [w + np.random.randn(*w.shape) * 0.2 if np.random.rand() < mutation_rate else w for w in weights]
model.set_weights(new_weights)
def evaluate_models(models):
# Play games and calculate scores
pass
def next_generation(population, top_k):
scores, _ = evaluate_models(population)
sorted_indices = np.argsort(scores)[::-1]
best_indices = sorted_indices[:top_k]
new_generation = [population[i] for i in best_indices]
while len(new_generation) < len(population):
parents = random.sample(new_generation[:top_k], 2)
offspring = crossover(parents[0], parents[1])
mutate(offspring, 0.2)
new_generation.append(offspring)
return new_generation
def genetic_algorithm():
population = [create_model(ROW_COUNT, COLUMN_COUNT) for _ in range(50)]
for generation in range(100):
population = next_generation(population, 5)
return population[0]