After the suggestions to use numpy I did a bit of research and came with this solution for the first part of the soft-max implementation.
prob_t = [0,0,0] #initialise
for a in range(nActions):
prob_t[a] = np.exp(Q[state][a]/temperature) #calculate numerators
#numpy matrix element-wise division for denominator (sum of numerators)
prob_t = np.true_divide(prob_t,sum(prob_t))
There's a for loop less than my initial solution. The only downside I can appreciate is a reduced precision.
using numpy:
[ 0.02645082 0.02645082 0.94709836]
initial two-loops solution:
[0.02645082063629476, 0.02645082063629476, 0.9470983587274104]