Question

All the examples I have seen of neural networks are for a fixed set of inputs which works well for images and fixed length data. How do you deal with variable length data such sentences, queries or source code? Is there a way to encode variable length data into fixed length inputs and still get the generalization properties of neural networks?

Was it helpful?

Solution

You would usually extract features from the data and feed those to the network. It is not advisable to take just some data and feed it to net. In practice, pre-processing and choosing the right features will decide over your success and the performance of the neural net. Unfortunately, IMHO it takes experience to develop a sense for that and it's nothing one can learn from a book.

Summing up: "Garbage in, garbage out"

OTHER TIPS

I have been there, and I faced this problem. The ANN was made for fixed feature vector length, and so are many other classifiers such as KNN, SVM, Bayesian, etc. i.e. the input layer should be well defined and not varied, this is a design problem. However, some researchers opt for adding zeros to fill the missing gap, I personally think that this is not a good solution because those zeros (unreal values) will affect the weights that the net will converge to. in addition there might be a real signal ending with zeros.

ANN is not the only classifier, there are more and even better such as the random forest. this classifier is considered the best among researchers, it uses a small number of random features, creating hundreds of decision trees using bootstrapping an bagging, this might work well, the number of the chosen features normally the sqrt of the feature vector size. those features are random. each decision tree converges to a solution, using majority rules the most likely class will chosen then.

Another solution is to use the dynamic time warping DTW, or even better to use Hidden Markov models HMM.

Another solution is the interpolation, interpolate (compensate for missing values along the small signal) all the small signals to be with the same size as the max signal, interpolation methods include and not limited to averaging, B-spline, cubic.....

Another solution is to use feature extraction method to use the best features (the most distinctive), this time make them fixed size, those method include PCA, LDA, etc.

another solution is to use feature selection (normally after feature extraction) an easy way to select the best features that give the best accuracy.

that's all for now, if non of those worked for you, please contact me.

Some problems could be solved by a recurrent neural network. For example, it is good for calculating parity over a sequence of inputs.

The recurrent neural network for calculating parity would have just one input feature. The bits could be fed into it over time. Its output is also fed back to the hidden layer. That allows to learn the parity with just two hidden units.

A normal feed-forward two-layer neural network would require 2**sequence_length hidden units to represent the parity. This limitation holds for any architecture with just 2 layers (e.g., SVM).

I guess one way to do it is to add a temporal component to the input (recurrent neural net) and stream the input to the net a chunk at a time (basically creating the neural network equivalent of a lexer and parser) this would allow the input to be quite large but would have the disadvantage that there would not necessarily be a stop symbol to seperate different sequences of input from each other (the equivalent of a period in sentances)

To use a neural net on images of different sizes, the images themselves are often cropped and up or down scaled to better fit the input of the network. I know that doesn't really answer your question but perhaps something similar would be possible with other types of input, using some sort of transformation function on the input?

i'm not entirely sure, but I'd say, use the maximum number of inputs (e.g. for words, lets say no word will be longer than 45 characters (longest word found in a dictionary according to wikipedia), and if a shorter word is encountered, set the other inputs to a whitespace character.

Or with binary data, set it to 0. the only problem with this approach is if an input filled with whitespace characters/zeros/whatever collides with a valid full length input (not so much a problem with words as it is with numbers).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top