Question

I was implementing logistic regression in python. To find theta , I was struggling to decide which is the best algorithm that always guarantees global optima without bothering about initial parameter theta.

import numpy as np
import scipy.optimize as op

def Sigmoid(z):
    return 1/(1 + np.exp(-z));

def Gradient(theta,x,y):
    m , n = x.shape
    theta = theta.reshape((n,1));
    y = y.reshape((m,1))
    sigmoid_x_theta = Sigmoid(x.dot(theta));
    grad = ((x.T).dot(sigmoid_x_theta-y))/m;
    return grad.flatten();

def CostFunc(theta,x,y):
    m,n = x.shape; 
    theta = theta.reshape((n,1));
    y = y.reshape((m,1));
    term1 = np.log(Sigmoid(x.dot(theta)));
    term2 = np.log(1-Sigmoid(x.dot(theta)));
    term1 = term1.reshape((m,1))
    term2 = term2.reshape((m,1))
    term = y * term1 + (1 - y) * term2;
    J = -((np.sum(term))/m);
    return J;

data = np.loadtxt('ex2data1.txt',delimiter=',');

# m training samples and n attributes
m , n = data.shape          
X = data[:,0:n-1]
y = data[:,n-1:]
X = np.concatenate((np.ones((m,1)), X),axis = 1)
initial_theta = np.zeros((n,1))
m , n = X.shape;

Result = op.minimize(fun = CostFunc, 
                     x0 = initial_theta,
                     args = (X,y), 
                     method = 'TNC',
                     jac = Gradient);
theta = Result.x;

where content of ex2data1.txt is:

34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
45.08327747668339,56.3163717815305,0
61.10666453684766,96.51142588489624,1
75.02474556738889,46.55401354116538,1
76.09878670226257,87.42056971926803,1
84.43281996120035,43.53339331072109,1
95.86155507093572,38.22527805795094,0
75.01365838958247,30.60326323428011,0
82.30705337399482,76.48196330235604,1
69.36458875970939,97.71869196188608,1
39.53833914367223,76.03681085115882,0
53.9710521485623,89.20735013750205,1
69.07014406283025,52.74046973016765,1
67.94685547711617,46.67857410673128,0
70.66150955499435,92.92713789364831,1
76.97878372747498,47.57596364975532,1
67.37202754570876,42.83843832029179,0
89.67677575072079,65.79936592745237,1
50.534788289883,48.85581152764205,0
34.21206097786789,44.20952859866288,0
77.9240914545704,68.9723599933059,1
62.27101367004632,69.95445795447587,1
80.1901807509566,44.82162893218353,1
93.114388797442,38.80067033713209,0
61.83020602312595,50.25610789244621,0
38.78580379679423,64.99568095539578,0
61.379289447425,72.80788731317097,1
85.40451939411645,57.05198397627122,1
52.10797973193984,63.12762376881715,0
52.04540476831827,69.43286012045222,1
40.23689373545111,71.16774802184875,0
54.63510555424817,52.21388588061123,0
33.91550010906887,98.86943574220611,0
64.17698887494485,80.90806058670817,1
74.78925295941542,41.57341522824434,0
34.1836400264419,75.2377203360134,0
83.90239366249155,56.30804621605327,1
51.54772026906181,46.85629026349976,0
94.44336776917852,65.56892160559052,1
82.36875375713919,40.61825515970618,0
51.04775177128865,45.82270145776001,0
62.22267576120188,52.06099194836679,0
77.19303492601364,70.45820000180959,1
97.77159928000232,86.7278223300282,1
62.07306379667647,96.76882412413983,1
91.56497449807442,88.69629254546599,1
79.94481794066932,74.16311935043758,1
99.2725269292572,60.99903099844988,1
90.54671411399852,43.39060180650027,1
34.52451385320009,60.39634245837173,0
50.2864961189907,49.80453881323059,0
49.58667721632031,59.80895099453265,0
97.64563396007767,68.86157272420604,1
32.57720016809309,95.59854761387875,0
74.24869136721598,69.82457122657193,1
71.79646205863379,78.45356224515052,1
75.3956114656803,85.75993667331619,1
35.28611281526193,47.02051394723416,0
56.25381749711624,39.26147251058019,0
30.05882244669796,49.59297386723685,0
44.66826172480893,66.45008614558913,0
66.56089447242954,41.09209807936973,0
40.45755098375164,97.53518548909936,1
49.07256321908844,51.88321182073966,0
80.27957401466998,92.11606081344084,1
66.74671856944039,60.99139402740988,1
32.72283304060323,43.30717306430063,0
64.0393204150601,78.03168802018232,1
72.34649422579923,96.22759296761404,1
60.45788573918959,73.09499809758037,1
58.84095621726802,75.85844831279042,1
99.82785779692128,72.36925193383885,1
47.26426910848174,88.47586499559782,1
50.45815980285988,75.80985952982456,1
60.45555629271532,42.50840943572217,0
82.22666157785568,42.71987853716458,0
88.9138964166533,69.80378889835472,1
94.83450672430196,45.69430680250754,1
67.31925746917527,66.58935317747915,1
57.23870631569862,59.51428198012956,1
80.36675600171273,90.96014789746954,1
68.46852178591112,85.59430710452014,1
42.0754545384731,78.84478600148043,0
75.47770200533905,90.42453899753964,1
78.63542434898018,96.64742716885644,1
52.34800398794107,60.76950525602592,0
94.09433112516793,77.15910509073893,1
90.44855097096364,87.50879176484702,1
55.48216114069585,35.57070347228866,0
74.49269241843041,84.84513684930135,1
89.84580670720979,45.35828361091658,1
83.48916274498238,48.38028579728175,1
42.2617008099817,87.10385094025457,1
99.31500880510394,68.77540947206617,1
55.34001756003703,64.9319380069486,1
74.77589300092767,89.52981289513276,1

Above code gives theta = Result.x value as [-25.87282405 0.21193078 0.20722013]. This is global minimum if initial_theta = np.zeros((n,1)). But if initial_theta = np.ones((n,1)), it gives error. So in this case our result depends on initial values of parameter theta. So can this be automated in any way to avoid this issue.

Also I tried using 'BFGS' method instead of 'TNC' method in minimize function call as shown below, then I get RuntimeWarning.

initial_theta = np.zeros((n,1))
result = op.minimize(fun = CostFunc, 
                     x0 = intial_theta,
                     args = (X,y),
                     method = 'BFGS', 
                     jac = Gradient);
optimal_theta =  result.x

I have called above function several times with different initial values of initial_theta and I found that BFGS maximum time converges to local minima. When I called BFGS with

initial_theta = np.array([-25,0.2,0.2])

which is nearer to global minima, it converged. So it seems that TNC is better than BFGS because with intial_theta being same in both cases, TNC converges to global minima while BFGS converges to local minima. So

  1. Is this true in all cases or it depends on particular problem?
  2. Which is better BFGS or TNC?
  3. Is there any difference between scipy.optimize.fmin_bfgs and scipy.optimize.minimize with method parameter = 'BFGS' or both are same?

Any help or insight will be helpful. Thank you.

Was it helpful?

Solution

There is no practical algorithm that is guaranteed to find a global optimum. However, there are some heuristics like DIRECT (see e.g. here) that work very well in practice for given bounds. These can be used to find a good initialization for an algorithm that finds the local optimum in the vicinity of the initialization and works more efficiently.

  1. However, logistic regression is a convex optimization problem. That means there is only one minimum of the objective function (error function), i.e. the local minimum is always the global minimum. Hence, you can use any local optimizer (Gradient Descent, L-BFGS, Conjugate Gradient, ...). The only problem is that you cannot compute the minimum directly because of the nonlinear logistic function. There is a similar problem called linear regression without that logistic function. In this case the global minimum of the error function can be computed directly without any complex optimization algorithm.

  2. A comparison of optimizers for logistic regression can be found in Fabian Pedregosa's blog. My first guess would be that you have an error in your gradient computation. Maybe you should compare it to the numerical approximation of the gradient with scipy.optimize.check_grad.

  3. scipy.optimize.minimize calls scipy.optimize.fmin_bfgs

OTHER TIPS

This isn't possible with an efficient, general algorithm. You'll never really know what the cost function looked like on the inputs you didn't try. Perhaps there was some miracle trench running through a high plateau you ignored. Perhaps the cost function starts with if arg1 == secret: return -1e100. Who can say? If you really, absolutely need a global minimum, you either need to take advantage of extra knowledge about the cost function, or you need to try each and every single possible input.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top