I found a good article about this problem.
Cutting through a lot of words, we can simplify the argument to stating that the original expression
log(1 + exp(s))
can be rewritten as
log(exp(s)*(exp(-s) + 1))
= log(exp(s)) + log(exp(-s) + 1)
= s + log(exp(-s) + 1)
This stops overflow from occurring - it doesn't prevent underflow, but by the time that occurs, you have your answer (namely, s
). You can't just use this instead of the original, since it will still give you problems. However, we now have the basis for a function that can be written that will be accurate and won't produce over/underflow:
function LL = logistic(s)
if s<0
LL = log(1 + exp(s));
else
LL = s + logistic(-s);
I think this maintains reasonably good accuracy.
EDIT now to the meat of your question - making this vectorized, and allowing the calculation of the slope as well. Let's take these one at a time:
function LL = logisticVec(s)
LL = zeros(size(s));
LL(s<0) = log(1 + exp(s(s<0)));
LL(s>=0) = s(s>=0) + log(1 + exp(-s(s>=0)));
To obtain the average you wanted:
L = logisticVec(X*beta) / N;
The slope is a little bit trickier; note I believe you may have a typo in your expression (missing a multiplication sign).
dL/dbeta = sum(X * exp(X*beta) ./ (1 + exp(X*beta))) / N;
If we divide top and bottom by exp(X*beta)
we get
dL = sum(X ./ (exp(-X*beta) + 1)) / N;
Once again, the overflow has gone away and we are left with underflow - but since the underflowed value has 1
added to it, the error this creates is insignificant.