Implementing neural network for vowel recognition in matlab - input layer units and the structure?

https://stackoverflow.com/questions/18810095

28-06-2022
|

سؤال

I am doing a project on vowel recognition and I need to implement a neural network. I am new to this field so I am not entirely sure about how to do it right. I have a training set of 800 words with 8 types of vowels, and my first step was to check if I am able to classify them with logistic regression for multiple classes:

- Using Matlab, i performed wavread for each sample and stored the resulting vectors in a 800 x 48117 matrix, 48117 being the size of the largest wav vector. I.e. at this step I have 800 examples and 48117 "features", which are the values of the frequency rates for each sound file. When I run logistic regression, it iterates over the set and classifies it into 8 classes with the accuracy of ~99.8%. Then I also generate spectrograms for the received classes for the sake of visualisation of each class and to compare them with the original samples' spectrograms.

- To distinguish the resonating frequencies of the vowels, we have 3 formants - F1, F2, F3, which one can see on the spectrograms. (for example F1 is 500 Hz, and we can see that the spectrogram has the darkest colors in that area on the plot).

- I am at the step of creating a neural network and I am pretty much at a loss about how to start. I am not sure how many input layer units and hidden layer units to have. Firstly, I think that having 48117 features and having that amount of input units is not right, so I have to minimize the amount of features somehow. I am thinking that the right way would be to somehow split them up into 3 groups corresponding to the 3 formants. This is the main question - on the basis of what can I generalize over long vectors to be able to have 3 input units?

- Another question, which seems a bit more trivial, is how many hidden units should I have. I understand that there are no particular rules for how many to have, but based on my training set, how many would an experienced neural network person recommend?

المحلول

I am not sure I understand correctly your input. From what I gather, wavread reads .wav file as a "vector of amplitudes".

First of all having 4837 inputs, k-sized hidden layer, and 8 classes makes this network to have 4837*k + 8*k weights, which can be quite huge. Way too much for 800 training inputs. It is often agreed upon (but, it is more art than science) that hidden layer shouldn't be much smaller than the input layer.

I am also not sure why do want need a neural network if the logistic regression performed well.

Having those doubts I am not sure I am answering your question, but I will try. You need to decrease the input size. It can be done in many ways, one is wavelet/fourier analysis (which is casting one space to lower-dimensional). After doing the fourier analysis you can "bucket" different frequencies. Simpler way out is to do dimensionality reduction (one function in matlab, something like PCA). It is motivated by the fact that nearby values are very highly correlated. It is called "whitening" in image analysis.

Size of the hidden layer is very hard to estimate. The best way is to do experiments for different sizes of the hidden layer size and pick the best one (run a loop overnight and see results).

نصائح أخرى

the number of hidden unit depends on what kind of performance you want , for less performance and good result you can have more hidden unit but for good performance and less accurate result.

So my suggestion is to try for different number of hidden unit and chose the one that fit your application.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow