Pergunta

So basically I am dealing with a training and test data set (a bunch of arrays) with unequal length like these:

a: {true, [1,3, 4, 5, 5, 8 ,10 ,10]}
b: {true, [1,3, 25, 18 ,1 ,10]}
c: {false, [1, 8 ,10]}
d: {false, [1,3  ,10 ,10]}

I am new to machine learning area and I am stucked in how to make these un-equal length input arrays become equal length, thus can leverage existing machine learning algorithms easily..

Currently I can think to use Largest Common Sequence to find the similarity among input arrays with different lengths..

But basically after I got the LCS information, then how can I transform the input arrays into equal length ones..?

Am I on the right way? Could anyone give me some help?

Foi útil?

Solução

There is no general solution for such task. Everything depends on what your data actually represents. There are dozens of feature extraction techniques which work well with various length data, but the choice of the particular one depends on the particular task. There is no, and cannot be, one universal method of making variable length representation the constant length one. LCS seems like very odd approach, which should give very wrong results (at least in general case, maybe in this particular problem it has a meaning). If this is the case (LCS do have a meaning) than it will return one dimension of the new representation. It is just one feature of your new vector. You need many more to apply any reasonable machine learning technique.

Outras dicas

Most of the ML algorithms requiring fixed input and output. Fully Convolutional Neural Networks or Recurrent Neural Networks can work with varying input and output.

But in your situation i would recommend just grt the max. array length and append zeroes to that arrays that having less items then max.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top