Question

I will try to explain my issue at a high level, and I hope I'd be able to get some better understanding of the ML behind it all. I am working with aggregated features extracted from audio files, so each feature vector is of size (1xN). The output would be a single sentiment label, Positive, Neutral, or Negative. I mapped these to 2, 1, 0 respectively (the labels are discrete by design, but maybe I could make it continuous?)

The dataset I am using is 90% neutral, 6% negative, and 4% positive, and I split these into train/dev/test. I wrote up a basic DNN in PyTorch, and have been training using CrossEntropyLoss and SGD (with nesterov momentum). The issue I am running into is that the network, after seeing only ~10% of the data, starts to predict only netural labels. The class weights converge to something like

tensor([[-0.9255],
        [ 1.9352],
        [-1.1473]])

no matter what 1xN feature vectors you feed in. I would appreciate guidance on how to address this issue. For reference, the architecture is

DNNModel(
  (in_layer): Linear(in_features=89, out_features=1024, bias=True)
  (fcs): Sequential(
    (0): Linear(in_features=1024, out_features=512, bias=True)
    (1): Linear(in_features=512, out_features=256, bias=True)
    (2): Linear(in_features=256, out_features=128, bias=True)
  )
  (out_layer): Sequential(
    (0): SequenceWise (
    Linear(in_features=128, out_features=3, bias=True))
  )
)

def forward(self, x):
    x =  F.relu(self.in_layer(x))
    for fc in self.fcs:
        x = F.relu(fc(x))
    x = self.out_layer(x)
    return x

Not sure if NN actually makes sense -- could it be the relus between each hidden layer or the bias? Or something else?

Reposted from Stack Overflow here, since this forum is more appropriate: link

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top