Thiết kế website giá rẻ

Question

I have created the following function that can generate a list of 0s and 1s (basically a bitstring) using randomization.

import numpy as np

def generate_genome(length: int, max_individuals: int = 0) -> Genome:
    bits = None
    if max_individuals > 0:
        num_individuals = np.random.randint(1, max_individuals + 1)
        print(f'Will have maximum of {num_individuals} individuals')
        print('Controlled randomization')
        bits = np.full(length, 0)
        bits_flipped_ids = np.random.choice(range(0, length), size=num_individuals, replace=False)
        print(f'Following indices will be flipped: {sorted(bits_flipped_ids)}')
        np.put(a=bits, ind=bits_flipped_ids, v=1)
    else:
        print('Standard randomization')
        bits = np.random.choice([0, 1], size=length)

    genome = Genome(bits=bits.tolist())
    print(f'Genome bits: {genome.bits}')
    return genome

It supports two modes:

without max_individuals – creates a bit-list of a specific length
with max_individuals – creates a bit-list of a specific length but also ensures that the number of 1s is not exceeding max_individuals. The placement of the 1s is at random indices (restricted by the allowed length of the list of course)

This is part of a genetic algorithm I am working on. This would produce samples similar to the ones below:

without max_individuals

generate_genome(10, 0)

[0 1 1 0 0 1 1 1 1 0]
[1 0 1 1 1 1 0 1 1 1]
[0 1 0 1 0 1 1 0 0 0]
[1 1 0 0 0 1 1 0 1 1]
[0 1 0 0 1 0 0 1 0 0]

with max_individuals

generate_genome(10, 3)

[0 0 0 0 0 0 0 0 0 1] with bits flipped at [9]
[0 0 0 1 0 1 0 1 0 0] with bits flipped at [3 5 7]
[0 1 0 0 0 1 0 1 0 0] with bits flipped at [1 5 7]
[0 1 0 1 0 0 0 0 0 0] with bits flipped at [1 3]

My problem is that there is the possibility of generating same bit-lists. This possibility increases with the smaller length and max_individuals are. I would like to have control over that to see how it affects my algorithm so I am looking for an efficient way to create a set of unique bit-lists using my function or even better – a set of bit-lists where the number of unique lists can be controlled by a parameter.

I managed to simplify the generate_genome() function based on a suggestion:

def generate_genome(length: int, max_individuals: int = 0):
    # For the given length (number of bits) the maximum (where all are 1s)
    # is (2^length - 1). The numpy.arange() stops at (stop - 1)
    bits_all_possible = np.arange(2**length)
    # Shuffle to introduce randomness
    np.random.shuffle(bits_all_possible)
    
    if max_individuals > 0:
        bits_all_possible = np.array([b for b in bits_all_possible if np.bitwise_count(b) <= max_individuals])
        
    # Pick a random index between 0 and the length of all possible bit-fields
    bits_selected = np.random.randint(0, len(bits_all_possible))
    # Use the index to select the number
    bits_number = bits_all_possible[bits_selected]
    # Convert the number to a bit-field
    bits = [int(b) for b in bin(bits_number)[2:]]

    genome = Genome(bits=bits.tolist())
    print(f'Genome bits: {genome.bits}')
    return genome

However, this is not a feasible solution in my case because we are talking about encoding and not plain old binary numbers. What this means is that I can have e.g. 3121200 of individuals. Respectively all genomes will have a maximum value of pow(N, length-1) = pow(2, 3121200-1) = ??? that cannot be calculated.

Even if it worked, this simplification does not fix the issue with creating a unique list of bit-fields by using generate_genome() remains. The straight-forward (though not sure how efficient) solution would be to create a list and iteratively start calling generate_genome(). Every time a new genome is created, I can check if it is already present in the list. If so, it will be discarded, otherwise – added.

Currently I am testing Python’s set datastructure:

if unique_genomes:
    # Using a set allows ignoring newly inserted elements if these are already present
    genomes = set()
    genomes_len_old = 0
    for genome_counter in range(size):
        genome = Genome.generate_genome(genome_length, max_individuals)
        genomes_len_old = len(genomes)
        genomes.add(genome)
        # If the genome is already in the set, the number of elements in the set
        # will not change
        if genomes_len_old == len(genomes):
            # Reduce the counter by 1 and try again
            genome_counter -= 1
            continue
    genomes = list(genomes)

The problem with any solution that revolves around trying to insert a new genome and then again, if it is not unique, until a certain total number of genomes is reached is that it may lead to a timeless struggle to find the next genome that would fit in the already existing gene pool since every time generate_genome() is called, there is an unknown possibility that a genome will be created that already exists, hence worst case I may create an endless loop. In the past I have added a termination criterion, namely how many attempts there should be before breaking. Here, this is not an option.

Thiết kế website giá rẻ

Danh mục

How to generate multiple lists/vectors (same length) that are unique or partially unique?