Eigenvalues computed from R different from other statistical packages and literature results

StackOverflow https://stackoverflow.com/questions/17101267

  •  31-05-2022
  •  | 
  •  

Question

I am detecting Multicollinearity using eigen values and vector for longley data. When I compute eigen values from SPSS I found different eigen values than R language. I don't why. I computed for both Standardized X matrix and actual X matrix but results mismatch.

data(longley)
x<-as.matrix(longley[,-7])
e<-eigen(t(x)%*%x)

The following is the result from R Language

$values
[1] 6.665299e+07 2.090730e+05 1.053550e+05 1.803976e+04 2.455730e+01
[6] 2.015117e+00

Following is the result from SPSS

6.861392768154346
0.08210250361264278
0.04568078445788493
0.01068846567618869
1.29228130384155E-4
6.2463047077443345E-6
3.663846498908749E-9

What is the possible command error? Also guide me how to compute proportional explained variation.

Was it helpful?

Solution

For collinearity diagnostic by eigenvalues one should rescale the X matrix including intercept as "obtained by dividing each original value by the square root of the sum of squared original values for that column in the original matrix, including those for the intercept" After that have to compute the eigenvalues.

Its R code is

data (longley) 
X<-as.matrix(cbind(1,longley[,-7])) 
X <- apply(X, 2 , function(x) x/sqrt(sum(x^2))) 
eigen(t(X)%*%X) 

The obtained values are now not only matches the literature but also other software.

OTHER TIPS

This "answer" is really just a long comment.

Here's longley[,-7].

> longley[,-7]
     GNP.deflator     GNP Unemployed Armed.Forces Population Year
1947         83.0 234.289      235.6        159.0    107.608 1947
1948         88.5 259.426      232.5        145.6    108.632 1948
1949         88.2 258.054      368.2        161.6    109.773 1949
1950         89.5 284.599      335.1        165.0    110.929 1950
1951         96.2 328.975      209.9        309.9    112.075 1951
1952         98.1 346.999      193.2        359.4    113.270 1952
1953         99.0 365.385      187.0        354.7    115.094 1953
1954        100.0 363.112      357.8        335.0    116.219 1954
1955        101.2 397.469      290.4        304.8    117.388 1955
1956        104.6 419.180      282.2        285.7    118.734 1956
1957        108.4 442.769      293.6        279.8    120.445 1957
1958        110.8 444.546      468.1        263.7    121.950 1958
1959        112.6 482.704      381.3        255.2    123.366 1959
1960        114.2 502.601      393.1        251.4    125.368 1960
1961        115.7 518.173      480.6        257.2    127.852 1961
1962        116.9 554.894      400.7        282.7    130.081 1962

This shows seven columns, but the last column just copies the index that is in the first column. I suspect that in SPSS, you have processed all 7 columns, while in R you processed 6 columns.

This is just a guess--I don't have SPSS, so I can't even try to reproduce your result.

The calculation that you've done in R just computes the eigenvalues of xT * x, and those values are correct. Here's the same calculation in Python, using numpy:

In [5]: x
Out[5]: 
array([[   83.   ,   234.289,   235.6  ,   159.   ,   107.608,  1947.   ],
       [   88.5  ,   259.426,   232.5  ,   145.6  ,   108.632,  1948.   ],
       [   88.2  ,   258.054,   368.2  ,   161.6  ,   109.773,  1949.   ],
       [   89.5  ,   284.599,   335.1  ,   165.   ,   110.929,  1950.   ],
       [   96.2  ,   328.975,   209.9  ,   309.9  ,   112.075,  1951.   ],
       [   98.1  ,   346.999,   193.2  ,   359.4  ,   113.27 ,  1952.   ],
       [   99.   ,   365.385,   187.   ,   354.7  ,   115.094,  1953.   ],
       [  100.   ,   363.112,   357.8  ,   335.   ,   116.219,  1954.   ],
       [  101.2  ,   397.469,   290.4  ,   304.8  ,   117.388,  1955.   ],
       [  104.6  ,   419.18 ,   282.2  ,   285.7  ,   118.734,  1956.   ],
       [  108.4  ,   442.769,   293.6  ,   279.8  ,   120.445,  1957.   ],
       [  110.8  ,   444.546,   468.1  ,   263.7  ,   121.95 ,  1958.   ],
       [  112.6  ,   482.704,   381.3  ,   255.2  ,   123.366,  1959.   ],
       [  114.2  ,   502.601,   393.1  ,   251.4  ,   125.368,  1960.   ],
       [  115.7  ,   518.173,   480.6  ,   257.2  ,   127.852,  1961.   ],
       [  116.9  ,   554.894,   400.7  ,   282.7  ,   130.081,  1962.   ]])

In [6]: eigvals(x.T.dot(x))
Out[6]: 
array([  6.66529929e+07,   2.09072969e+05,   1.05355048e+05,
         1.80397602e+04,   2.45572970e+01,   2.01511742e+00])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top