As @pv. pointed out as a comment, I made a mistake in computing the gradient. First of all, the correct (mathematical) expression for the gradient of my objective function is:
(notice the minus sign.) Furthermore, my Python implementation was completely wrong, beyond the sign mistake. Here's my updated gradient:
def gradient(x):
nb_comparisons = cijs + cijs.T
x = np.insert(x, 0, 0.0)
tiles = np.tile(x, (len(x), 1))
combs = tiles - tiles.T
probs = 1.0 / (np.exp(combs) + 1)
mat = (nb_comparisons * probs) - cijs
grad = np.sum(mat, axis=1)
return grad[1:] # Don't return the first element.
To debug it , I used:
scipy.optimize.check_grad
: showed that my gradient function was producing results very far away from an approximated (finite difference) gradient.scipy.optimize.approx_fprime
to get an idea of the values should look like.- a few hand-picked simple examples that could be analyzed by hand if needed, and a few Wolfram Alpha queries for sanity-checking.