Converting Pseudocode to C++ [closed]

Question 1

I found this article quite readily: Value iteration and policy iteration algorithms for Markov decision problem [PDF file]. It explains quite a bit more what's going on.

Conceptually, you have a system that can be in a number of states, rewards for transitions from one state to another, and actions that sometimes can result in state transitions. The basic idea is to keep iterating until you have arrived at a utility matrix that doesn't change That's what that final test max_(s_i) | U(s_i) - U'(s_i)| < e looks for. (Here, e is short for epsilon, a small number, and probably should be an additional input.)

For each iteration, you want to take the best action for each state. The best action is the one that yields the greatest reward, weighted by probability. That's what max_a Summation_j P^a_ij*U(s_j) does: Find the action that yields the best reward, weighted by probability.

Question 2

I can translate bits and pieces, but there's a lot of information in your code that only makes sense in context, and there's no way for us to know that context.
~~Also, it appears that some formatting was lost along the way, since P^a_ij looks like it was at one point P to the power of a_i times j.~~ David seems to know how to interpret the crazy bit.
Also the condition loop uses | in the pseudo-code which is wierd, but I took it literally.

utility_matrix VALUE_ITERATION(const probability_matrix& P,
                               const reward_matrix& R)
{
    utility_matrix U(R);
    utility_matrix UP(R);
    do {
        U = UP;
        for(int s_i : ????) //for each state in what?
            UP[s_i] = R[s_i] + ???? //max_a Summation_j P^a_ij*U(s_j)
    while(max(s_i) ???? std::abs(U[s_i] - UP[s_i])<e);
    return U;
}

As akira said, the understandable bits are straightforward, if you couldn't do those, you might need to learn more about C++ before you tackle this.

As per your comment, I found C code that looks vaguely like your algorithm here. (Lines 62-76)