Question

Let me start with the version of genetic algorithm I am implementing. I apologize in advance for any terminology errors that I make here. Please feel free to correct me.

The chromosome for my problem is two dimensional. Three rows and thirty two columns. Essentially the alleles (values) are indexes that are contained by this chromosome.

How an Index is formulated
Each row and column (together) of the chromosome refer to a single gene. Each gene contains an integer value (0 - 30). A single column (I believe referred to as a gnome) therefore refers to an index of a four dimensional array containing user data on which the fitness function operates.

This is how a chromosome would look like

11 22 33 14 27 15 16 ...
3  29  1  7 18 24 22 ...
29  9 16 10 14 21  3 ...

e.g. column 0 ==> data[11][3][29]
where
     11 -> (0, 0); 0th row, 0th column 
     3  -> (1, 0); 1st row, 0th column
     29 -> (2, 0); 2nd row, 0th column

For completeness, the fitness function works as follows: (for a single chromosome)

for first 10 iterations: (user 0 to 9)
    for each column (genome)
         consider gene value for first row as the first index of data array
         consider gene value for the second row as the second index of data array
         consider gene value for the third row as the third index of data array

         so if the first column contains [11][3][29] user = 0 
         it refers to data[0][11][3][29]

     SUM the data array value for all columns and save it
Do the same for all iterations (users)

 for second 10 iterations: (user 10 to 19)
    for each column (genome)
         consider gene value for the SECOND row as the FIRST index of data array
         consider gene value for the THIRD row as the SECOND index of data array
         consider gene value for FIRST row as the THIRD index of data array

     SUM the data array value for all columns and save it
Do the same for all iterations (users)


for third 10 iterations: (user 20 to 29)
    for each column (genome)
         consider gene value for the THIRD row as the FIRST index of data array
         consider gene value for FIRST row as the SECOND index of data array
         consider gene value for the SECOND row as the THIRD index of data array

     SUM the data array value for all columns and save it
Do the same for all iterations (users)

Out of the 30 (sum) values calculated so far, assign the minimum value as fitness value
to this chromosome.

The point to explain the fitness function here is to explain the optimization problem I am dealing with. I am sorry I could not formulate it in Mathematical notation. Anyone who could do it, his/her comment is more than welcome. Essentially it is maximizing the minimum X. Where X refers to data contained in data array. (maximizing is done over generation where the highest fitness chromosomes are selected for next generations)

Q1) I am using a single random number generator for crossover and mutation probabilities. Generally speaking, is this correct was to implement it with a single generator? I ask this question because the crossover rate I chose is 0.7 and mutation to be 0.01. My random number generator generates a uniformly distributed integer number. The number are between 0 to (2^31 - 1). If a number generated by the random function lies under the border where it satisfies mutation, the same number also satisfies crossover. Does this effect the evolution process?

NOTE: the highest number that the random number generates is 2147483647. 1% of this value is 21474836. so whenever a number less than 21474836, it suggests that this gene can be mutated. this number also suggest that crossover must be done. Shouldn't there be different generators?

Q2) Although I see that there is a relation between genes is a column when calculating fitness. But while performing mutation, all the genes should be considered independent from each other or all the rows for a genome (column) should be effected by mutation.

Explanation As I learned in a binary string of e.g. 1000 bits where each bit corresponds to a gene, with a mutation rate of 1% would mean 1 out of 100 bits might get flipped. in my case however I have chromosome which is 2D (3 rows, 32 columns). Should I consider all 96 genes independent of each other or simply consider 32 genes. And whenever I need a flip, flip the column all together. How does mutation work in 2D chromosome?

Q3) Do I really have a correlation between rows here. I am a bit confused?

Explanation I have 2D chromosome, whose column values altogether points to the data i have to use to calculate fitness of this chromosome. Genetic algorithm manipulates chromosomes where as fitness is assigned by the data that is associated with this chromosome. My question is how would genetic algorithm should treat 2D chromosome. Should there be a relation between the genes in a column. Can I get a reference to some paper/code where a 2D chromosome is manipulated?

Was it helpful?

Solution

I'm not sure if i understood the chromosome structure, but it doesn't matter, the concepts are the same:

1 - You have a chromosome object, which you can access the individual genes

2 - You have a fitness function, which takes a chromosome and outputs a value

3 - You have a selection function, which selects chromosomes to mate

4 - You have a crossover function, which generally takes 2 chromosomes, exchange genes between them and outputs two new chromosomes

5 - You have a mutation operator, which acts randomly on the genes of a chromosome

So

Q1) You can use a single random generator, there's no problem at all. But why are you using integer numbers? It's much easier to generate a random between [0, 1).

Q2) This is up to you, but generally the genes are mutated randomly, independent of each other (mutation happens after the crossover, but i think you already know that).

EDIT: Yes, you should consider all the 96 genes independent of each other. For each mutation, you'll select one 'row' and one 'column' and modify (mutate) that gene with some probability p, so:

for row in chromosome.row
    for col in row
        val = random_between_0_and_1
        if val < p
            chromosome[row][col] = noise

Q4) It's up to you to decide what the fitness function will do. If this chromosome is 'good' or 'bad' at solving your problem, then you should return a value that reflects that.

OTHER TIPS

All the random numbers you use would typically be independently generated, so use one RNG or many, it doesn't matter. You should generate new numbers for each gene for crossover and mutation step, if you use the same single random number for multiple purposes you will limit the explorable solution space.

To make your algorithm easier to understand, generate uniformly distributed floats in [0..1) as r()=rand()/(2^32-1), then you can express things simply as, for example,

if r() < 0.3
   mutate()

I don't understand your other questions. Please rewrite them.

An improvement you can do relatively to mutation and crossover probabilities is built a GA that choose these probabilities by itself. Because the use of given probabilities (or a function that evolves with the number of runs for probabilities) is always arbitrary, codify your operators inside chromosomes.

For example, you have two operators. Add a bit to the end of chromosome where 1 codify for mutation and 0 for crossover. When you apply operators on parents, you will obtain childs that will have the code for the operator to apply. In this way, the GA makes a double search: in the space of solutions and in the space of operators. The choose of operators is given by the nature of your problem a by the concrete conditions of the run. During the calculation, probabilites of both operators will change automatically to maximize you objective function.

Same thing for an arbitrary number of operators. You will need simply more bits to codify. I use generally three operators (three for crossover and one for mutation) and this mechanism works fine.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top