Is there a better way than this to implement Softmax Action Selection for Reinforcement Learning?

Question 1

After the suggestions to use numpy I did a bit of research and came with this solution for the first part of the soft-max implementation.

prob_t = [0,0,0]       #initialise
for a in range(nActions):
    prob_t[a] = np.exp(Q[state][a]/temperature)  #calculate numerators

#numpy matrix element-wise division for denominator (sum of numerators)
prob_t = np.true_divide(prob_t,sum(prob_t))

There's a for loop less than my initial solution. The only downside I can appreciate is a reduced precision.

using numpy:

[ 0.02645082  0.02645082  0.94709836]

initial two-loops solution:

[0.02645082063629476, 0.02645082063629476, 0.9470983587274104]

Question 2

Action selection based on probabilities can be easily done using numpy library.

q_values = [] #array of q_values
action = np.random.choice(q_values,p=q_values)

Question 3

After you evaluate the probabilities for each action, if you have a function to return you weighted random selection, you can get your desired action like this:

action = weighted_choice(prob_t)

Though I am not sure whether this is what you call "better way".

The weighted_choice can be something like this:

import random
def weighted_choice(weights):
    totals = []
    running_total = 0

    for w in weights:
        running_total += w
        totals.append(running_total)

    rnd = random.random() * running_total
    for i, total in enumerate(totals):
        if rnd < total:
            return i

Definitely check the binary search implementation in the article instead of the linear search one above if you have a lot of available actions.

Or if you have access to numpy:

import numpy as np
def weighted_choice(weights):
    totals = np.cumsum(weights)
    norm = totals[-1]
    throw = np.random.rand()*norm
    return np.searchsorted(totals, throw)