Question

Specifically, I'm working on a modeling project, and I see someone else's code that looks like

def forward(self, x):
    x = self.fc1(x)
    x = self.activation1(x)
    x = self.fc2(x)
    x = self.activation2(x)
    x = self.fc3(x)
    x = self.activation3(x)
    # use log softmax + NLLLoss in training; softmax to make predictions
    if self.training:
        x = self.log_softmax(x)
    else:
        x = self.softmax(x)
    return x

For context, this is using PyTorch, and it is on a classification problem. The criterion is NLLLoss. What's the rationale behind using log_softmax for training but using softmax for actual predictions?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top