Best practice is to do a one-hot (one-of-K) encoding: for each value that A
can take on, define a separate indicator feature. So with fives "types", A = type1
would be
[1, 0, 0, 0, 0]
and A = type3
is
[0, 0, 1, 0, 0]
Then concatenate these vectors with your other features so that your hypothesis becomes
H = w[Atype1] * [A=type1] + ... + w[Atype5] * [A=type5] + w[B] * B + ...
using []
to denote indicator functions.
This avoids the main problem with your approach, which is that you're introducing a number of (probably incorrect) biases, e.g. that type5 = type2 + type3
. For further intuition why this is better than your encoding, see this answer of mine.