Does selecting the same arm has the same reward?

https://cs.stackexchange.com/questions/92584

machine-learning
learning-theory

05-11-2019
|

Pergunta

In multi-armed bandit problem, we have a set of $K$ arms. In each round $t$, a bandit selects an arm $k$ and receives a reward $r_{kt}$. The objective is to maximize the rewards after $T$ rounds.

My question: Does selecting the same arm in two different rounds leads to the same reward? Or the rewards are completely different?

It is surprising to me if one could select the same arm but receives a different reward and still has a sublinear regret.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição

Não afiliado a cs.stackexchange