How to determine the learning rate and the variance in a gradient descent algorithm？

Question 1

Plotting is the best way to see how your algorithm is performing. To see if you have achieved convergence you can plot the evolution of the cost function after each iteration, after a certain given of iteration you will see that it does not improve much you can assume convergence, take a look to the following code:

cost_f = []
while (abs(theta1_guess-theta1_last) > variance or abs(theta0_guess - theta0_last) > variance):

    theta1_last = theta1_guess
    theta0_last = theta0_guess

    hypothesis = create_hypothesis(theta1_guess, theta0_guess)
    cost_f.append((1./(2*m))*sum([ pow(hypothesis(point[0]) - point[1], 2) for point in data]))

    theta0_guess = theta0_guess - learning_rate * (1./m) * sum([hypothesis(point[0]) - point[1] for point in data])
    theta1_guess = theta1_guess - learning_rate * (1./m) * sum([ (hypothesis(point[0]) - point[1]) * point[0] for point in data])   

import pylab
pylab.plot(range(len(cost_f)), cost_f)
pylab.show()

Which will plot the following graphic (execution with learning_rate=0.01, variance=0.00001)

As you can see, after a thousand iteration you don't get much improvement. I normally declare convergence if the cost function decreases less than 0.001 in one iteration, but this just based on my own experience.

For choosing learning rate, the best thing you can do is also plot the cost function and see how it is performing, and always remember these two things:

if the learning rate is too small you will get slow convergence
if the learning rate is too large your cost function may not decrease in every iteration and therefore it will not converge

If you run your code choosing learning_rate > 0.029 and variance=0.001 you will be in the second case, gradient descent doesn't converge, while if you choose values learning_rate < 0.0001, variance=0.001 you will see that your algorithm takes a lot iteration to converge.

Not convergence example with learning_rate=0.03

Slow convergence example with learning_rate=0.0001 enter image description here

Question 2

There are a bunch of ways to guarantee convergence of a gradient descent algorithm. There is line search, a fixed step size related to the Lipschitz constant of the gradient (that is in case of a function. In case of a table such as yours, you can make the difference between consecutive values), a decreasing step size for each iteration and some others. Some of them can be found here.