Question

I am trying to understand some simple neural net case using theano. The deeplearning.net site gives the following simple code for implementing a logistic regression application to a simple case:

import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print("Initial model:")
print(w.get_value())
print(b.get_value())

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following section of this tutorial)

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])

print("Final model:")
print(w.get_value())
print(b.get_value())
print("target values for D:")
print(D[1])
print("prediction on D:")
print(predict(D[0]))

I understand most of it, p_1 is the logistic regression function, the prediction is whether the value will be in the 0 class or 1 class, xent is the loss function, i.e. how far from correct is our prediction. I do not understand the next line, the cost. Shouldn't the cost be equal to the xent, i.e. the loss? What is the cost function representing here? Also, why is the bias initially set to 0 and not a random number like the weights?

Was it helpful?

Solution

I do not understand the next line, the cost. Shouldn't the cost be equal to the xent, i.e. the loss? What is the cost function representing here?

The cost is the error (xent.mean()) + some regularization (0.01 * (w ** 2).sum())

Why is the bias initially set to 0 and not a random number like the weights?

It is possible and common to initialize the biases to be zero, since the asymmetry breaking is provided by the small random numbers in the weights.

More details here.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top