Question

I am using a dataset from kaggle to train a feed forward neural-neteork with no convolutional layers. I wanted to try it this was as a learning exercise with Pytorch without Transfer Learning and Convolutional Layers. Here is the code with its output.

Network Architecture

class Classifier(nn.Module):
    def __init__(self):
        super().__init__()

        self.h0 = nn.Linear(99*99*3,1024)
        self.h1 = nn.Linear(1024,512)
        self.h2 = nn.Linear(512,256)
        self.h3 = nn.Linear(256,128)
        self.h4 = nn.Linear(128,1)

        self.dropout = nn.Dropout(p=0.2)

    def forward(self,x):
        x = x.view(x.shape[0],-1)
        x = torch.tanh(self.dropout(self.h0(x)))
        x = torch.tanh(self.dropout(self.h1(x)))
        x = torch.tanh(self.dropout(self.h2(x)))
        x = torch.tanh(self.dropout(self.h3(x)))
        x = torch.sigmoid(self.h4(x))

        return x

Paramerters

model = Classifier()
model.to(device)
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(),lr=0.00000001,momentum=0.9)

images,labels = next(iter(trainloader))
images,labels = make_gpu(images,labels)
print("Labels:{} ".format(labels.shape),"Images:{} ".format(images.shape))
probs = model(images)
print("Probabilities:{}".format(probs.shape))
loss = criterion(probs,labels)
loss.backward()
optimizer.step()
print(loss.item())
print(len(trainloader))
print(len(testloader))

Training and Validation

training_losses, testing_losses, test_acc, train_acc = [],[],[],[]

epochs = 10

for e in range(epochs):
    running_loss = 0
    tr_acc = 0
    for images,labels in trainloader:
        optimizer.zero_grad()
        images,labels = make_gpu(images,labels)
        probs = model(images)
        loss = criterion(probs,labels)
        loss.backward()
        optimizer.step()
        running_loss+=loss.item()
        probs = torch.round(probs)
        equals = probs == labels.view(*probs.shape)
        tr_acc += torch.mean(equals.type(torch.cuda.FloatTensor))



    else:
        testing_loss = 0
        acc = 0

        with torch.no_grad():
            model.eval()
            for images,labels in testloader:
                images,labels = make_gpu(images,labels)
                probs = model(images)
                loss = criterion(probs,labels)
                testing_loss+=loss
                probs = torch.round(probs)
                equals = probs == labels.view(*probs.shape)
                acc += torch.mean(equals.type(torch.cuda.FloatTensor))

            model.train()

            training_losses.append(running_loss/len(trainloader))
            testing_losses.append(testing_loss/len(testloader))
            test_acc.append(acc/len(testloader))
            train_acc.append(tr_acc/len(trainloader))

            print("Epoch: {}/{}.. ".format(e+1, epochs),
              "Training Loss: {:.3f}.. ".format(training_losses[-1]),
              "Test Loss: {:.3f}.. ".format(testing_losses[-1]),
              "Test Accuracy: {:.3f}..".format(test_acc[-1]),
              "Train Accuracy: {:.3f}".format(train_acc[-1]))

Output

Epoch: 1/10..  Training Loss: 0.694..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.503
Epoch: 2/10..  Training Loss: 0.694..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.502
Epoch: 3/10..  Training Loss: 0.694..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.501
Epoch: 4/10..  Training Loss: 0.694..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.503
Epoch: 5/10..  Training Loss: 0.694..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.503
Epoch: 6/10..  Training Loss: 0.695..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.497
Epoch: 7/10..  Training Loss: 0.695..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.499
Epoch: 8/10..  Training Loss: 0.695..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.499
Epoch: 9/10..  Training Loss: 0.695..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.499
Epoch: 10/10..  Training Loss: 0.695..  Test Loss: 0.694..  Test Accuracy: 0.505.. Train Accuracy: 0.501

Full Code

As you can see from the code the classifier performs worse the more it trains. Can someone please tell me why this is happening and how I can improve the model?

I have tried using SGD and Adam as the optimisers, both give a similar result. I have also tried learning rates 0.01 - 0.00000001 to no avail. Please help!

Was it helpful?

Solution

That could be due to many reasons. One of the reasons could be gradient vanishing/explosion. Changing your nonlinear function could be a solution. For example, you can use Relu function instead of tanh function.

Also, using a early stopping rule could prevent this problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top