Traditional genetic programming is not suited for variable length input.
It occurs to me some model of evaluation is presupposed in the question.
Consider, for example that you encode your variable length input to a single arbitrary precision value, for example for alphabet of 10 symbols:
ABCD = 1234; ABCDEF = 123456
or
ABCD = 0.1234; ABCDEF = 0.123456
However if this encoding is not natural for the problem domain, it will be quite hard to evolve a program that deals with such input well.
You could also suppose that problem can be adequately represented by a genetically derived finite state machine:
F(F(F(F(init(), A), B), C), D) = 1234
That's a separate field of study from genetic programming, google around, read research papers, perhaps you can find a package that does what you want for you.
Then again your problem may be best represented by yet another transformation, e.g. frequency of bigrams -- such transform is finite length:
# bigrams
# ABCDE => 1
"AA": 0
"AB": 0.25
"AC": 0
"AD": 0
"AE": 0
"BA": 0
"BC": 0.25
"BD": 0
#... up to end of alphabet ...
(0, 0.25, 0, 0, 0, 0, 0.25, 0, ...., 0, ...) => 1 # ABCDE
(0, 0.20, 0, 0, 0, 0, 0.20, 0, ...., 0.20, ...) => 10 # ABCDEF
# input length N^2
# trigrams
(0, 0.33, 0, 0, ..., 0, ...) => 1 # ABCDE
(0, 0.25, 0, 0, ..., 0.25, ...) => 10 # ABCDEF
# input length N^3
Bigrams, trigrams, etc are surprisingly good predictors:
- capture markov information ("ab" vs "ac")
- capture relative position ("ab" && "bc" vs "ed" && "bc")
- capture non-linear semantics ("abab" != "ab" * 2)
- resistant to shuffled input ("buy new spam" vs "buy spam it's new")
These are often used in natural language problems, such as text topic detection, author detection, spam protection; biotech, such as dna and rna sequences, etc.
However there is no guarantee this approach is applicable to your problem. It truly depends on you problem domain, for example consider alphabet 10+
in arithmetics domain, the following two inputs become indistinguishable, yet yield different results:
10000+10000 = 20000
1000+100000 = 101000
In this case you need something like a register machine:
init: tmp = 0; res = 0
"0": tmp *= 10
"1": tmp *= 10; tmp += 1
"+": res += tmp; tmp = 0
end: res += tmp