Matrix Approximation and Predicting Timeseries in Python/R with SVD

Question

I will use the golf course example data you linked, to set the stage:

import numpy as np
A=np.matrix((4,4,3,4,4,3,4,2,5,4,5,3,5,4,5,4,4,5,5,5,2,4,4,4,3,4,5))
A=A.reshape((3,9)).T

This gives you the original 9 rows, 3 columns table with scores of 9 holes for 3 players:

matrix([[4, 4, 5],
        [4, 5, 5],
        [3, 3, 2],
        [4, 5, 4],
        [4, 4, 4],
        [3, 5, 4],
        [4, 4, 3],
        [2, 4, 4],
        [5, 5, 5]])

Now the singular value decomposition:

U, s, V = np.linalg.svd(A)

The most important thing to investigate is the vector s of singular values:

array([ 21.11673273,   2.0140035 ,   1.423864  ])

It shows that the first value is much bigger than the others, indicating that the corresponding Truncated SVD with only one value represents the original matrix A quite well. To calculate this representation, you take column 1 of U multiplied by the first row of V, multiplied by the first singular value. This is what the last cited command in R does. Here is the same in Python:

U[:,0]*s[0]*V[0,:]

And here is the result of this product:

matrix([[ 3.95411864,  4.64939923,  4.34718814],
        [ 4.28153222,  5.03438425,  4.70714912],
        [ 2.42985854,  2.85711772,  2.67140498],
        [ 3.97540054,  4.67442327,  4.37058562],
        [ 3.64798696,  4.28943826,  4.01062464],
        [ 3.69694905,  4.3470097 ,  4.06445393],
        [ 3.34185528,  3.92947728,  3.67406114],
        [ 3.09108399,  3.63461111,  3.39836128],
        [ 4.5599837 ,  5.36179782,  5.0132808 ]])

Concerning the vector factors U[:,0] and V[0,:]: Figuratively speaking, U can be seen as a representation of a hole's difficulty, while V encodes a player's strength.