Trying to find a good data representation for a genetic (evolutionary) algorithm, but I just can't imagine one

Question 1

You seem to be having a problem with Winston Ewert's answer. I'm making this a separate answer for reasons of length and formatting, but what Winston is suggesting is a reasonable approach.

You ask about the "array of characters" and what your parents and offspring are, how you'd do crossover, mutation, etc. I think maybe you're confused by that array of characters bit. Don't think of it as an array of characters -- it's an array of words. This is important, because you don't want to do crossover between word boundaries or change letters inside a word via mutation. You want to shuffle the words around. So instead of dealing with the words, let's take a slightly different (but equivalent) approach.

Let's number each of your 2n words from 1 to 2n. To generate a candidate solution, we just pick n random numbers between 1 and 2n (without replacement). The randomly selected words become the rows to your matrix. So if the words are AGE,AGO,BEG,CAB,CAD,DOG, we pick three at random and end up with, for instance, 2, 3, and 5, (giving a chromosome of 235) or AGO,BEG,CAD.

This yields a matrix of

AGO
BEG
CAD

Now we count how many of the 2n total input words appear in the matrix. In this case, it's only three, as none of the columns make valid words. Your fitness for the input 235 is 3. The input 416 works though -- its fitness is 6.

To generate a population, you just generate N random sets of three numbers (without repetition) between 1 and 2n. With a population size of 4, you might get

These give you four different potential solutions:

142 = AGE
      CAB
      AGO

354 = BEG
      CAD
      CAB

624 = DOG
      AGO
      CAB

241 = AGO
      CAB
      AGE

To do crossover, you'd need to design a method that avoided duplicated numbers in the offspring. I've in the past adapted the Cycle Crossover method designed for permutations to handle situations similar to this, but you can design any method that seems reasonable. For instance, you could do a simple uniform crossover and then just fix any duplicate values by changing one of them to something else.

Mutation is simply changing one number into another or swapping the positions of two numbers in the encoding.

You say you must use a GA, so you could stop here and try something like that. It's not likely to work extremely well, for a couple of reasons. First, there are going to be a lot of plateaus in the fitness function. Almost every possible answer is equally wrong, so there's not much of a gradient for the GA to follow. It's a needle in a haystack search problem. You could try to modify the way you score the fitness values to alleviate this somewhat. For instance, I know that only one word starts with D in your example set, so any solution with DOG in the first row is automatically wrong. I could award points to solutions that gave more "plausible" wrong answers. However, in general, this is going to be a tough problem for a GA to solve efficiently.

Second, constraint satisfaction problems like this one can be much more efficiently solved with specialized techniques, no matter how well you build your GA. You could easily solve this with backtracking.

Question 2

Some options:

Just have the matrix of letters as your representation. So your optimal solution from the example would be:

CABAGEDOG

Then for your fitness, just give on point for each requested word that's actually in the solution.

Or have the representation consist of three words, one for each row.

   CAB
   AGE
   DOG

Your fitness then rewards words that are correct in the columns. Mutations swap out one word for another.