Question

A simple matrix benchmark test has indicated that Revolution Analytics R 2.13.2's LU decomposition is nearly 5 times slower than matrix multiplication. Theory and many years of practice show that LU should be 1/3 to 2/3 the time for A*A.

Revo R and Matlab are using Intel's Math Kernel for this test. R 2.14.1 is not using a kernel. Everything is 64-bit.

The anomaly shows up in Table 2 below. This is Table 1 normalized about A*A . There are other (apparent) anomalies but LU is the most glaring one.

                        Table 1 (secs)

                    A*A     LU     A\b    Det   Inv
----------------------------------------------------
R 2.14.1           0.757   0.43   0.45   0.20  1.11
Revo R 2.13.2      0.063   0.35   0.11   0.03  0.14
Matlab 2011b       0.062   0.08   0.10   0.07  0.16
----------------------------------------------------
Averaged over 20 runs on a 1000x1000 random matrix


                       Table 2 (normalized)

                    A*A     LU     A\b    Det   Inv
----------------------------------------------------
R 2.14.1             1     0.57   0.19   0.26  1.47
Revol R 2.13.2       1     4.67*  1.58   1.33  2.17
Matlab 2011b         1     0.67   1.72   0.61  1.68
----------------------------------------------------
Note: x = A\b in Matlab is x <- solve(A,b) in R.

UPDATE: I have followed Simon Urbanek's advice and replaced LUP = expand(lu(Matrix(A))); with lu(A); The Revo R rows are now

                    Revol R 2.13.2

              A*A    LU     A\b    Det   Inv
            ---------------------------------
  time       0.104  0.107  0.110  0.042  0.231  
  norm time  1.000  1.034  1.060  0.401  2.232 

Time in seconds on a

Dell Precision 690, 2 x Intel®  Xeon® E53405 CPU @ 2.33GHz,
16GB ram, 2 Processors, 8 Cores and 8 Threads, 
Windows 7 Prof., 64-bit

A work-in-progress report that contains tables and the code used are here.


UPDATE 2:

I have modified the matrix benchmark to test Matrix Decompositions only. These are the foundations on which all other matrix algorithms are built, and if these are shaky then all other algorithms will be shaky too.

I have changed to a brand new

Lenovo ThinkPad X220, Intel Core i7-2640M CPU @ 2.80GHz, 
8GB ram, 1 Processor, 2 Cores and 4 Threads
Windows 7 Professional, 64-bit.

Note: The Core i7 processor has Intel's Turbo Boost which increases the clock rate up to 3.5GHz if it senses a high demand. As far as I know, Turbo Boost is not under program(mer) control in any of the three systems.

These changes will, I hope, make the results more useful.

                          Table 3. Times(secs)

                  A*A    chol(A)   lu(A)    qr(A)    svd(A)   eig(A)   Total
-----------------------------------------------------------------------------
R 2.14.1         0.904    0.157    0.260    0.568    4.260    6.967    13.11
Revol R 2.13.2   0.121    0.029    0.062    0.411    1.623    3.265     5.51   
Matlab 2011b     0.061    0.014    0.033    0.056    0.342    0.963     1.47       
-----------------------------------------------------------------------------
                    Times(secs) averaged over 20  runs



                          Table 4. Times(normalized)

                  A*A    chol(A)   lu(A)    qr(A)    svd(A)  eig(A)   Total
----------------------------------------------------------------------------
R 2.14.1         1.000    0.174    0.288    0.628    4.714    7.711   14.52
Revol R 2.13.2   1.000    0.237    0.515    3.411   13.469   27.095   45.73
Matlab 2011b     1.000    0.260    0.610    0.967    5.768   16.774   25.38
----------------------------------------------------------------------------
                     Times(secs) averaged over 20  runs  

We can see from Table 4 that the spurious anomaly has disappeared and that all system behave as the theory predicts.

                          Table 5. Times/Matlab Times

                  A*A    chol(A)   lu(A)    qr(A)    svd(A)  eig(A)   Total
----------------------------------------------------------------------------
R 2.14.1          15      11         8       10       12       7        9
Revol R 2.13.2     2       2         2        7        5       3        4
----------------------------------------------------------------------------
                         Rounded to the nearest integer

Was it helpful?

Solution

FWIW you're not measuring the time of the decomposition, but the overhead you create by converting the matrices all the time. In regular R:

# actual lu call
> system.time(lu(A))
   user  system elapsed 
  0.136   0.000   0.139 
# your code
> system.time(expand(lu(Matrix(A))))
   user  system elapsed 
  0.536   0.000   0.537 

so chances are you're creating the overhead yourself. This matters in particular when the decomposition is fast. Also note that taking about performance of "R" is ambiguous since the performance of R itself varies strongly depending on which BLAS you use.

(As a side note - you may want to use system.time in your benchmarks - also of interest to you could be the microbenchmark R package)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top