Why is rmvnorm() function returning "In sqrt(ev$values) : NaNs produced", what is this error and how can it be corrected or avoided?

StackOverflow https://stackoverflow.com/questions/22463835

  •  16-06-2023
  •  | 
  •  

Question

I am working with financial/economic data in case you are wondering about the large size of some of the coefficients below... My general question has to do with the simulation of parameter coefficients output from a linear random effects model in R. I am attempting to generate a random sample of beta coefficients using the model coefficients and the variance-covariance (VCOV) matrix from the same model in R. My question is: Why am I receiving the error below about the square root of the expected values using the rmvnorm() function from the mvtnorm{} package? How can I deal with this warning/issue?

#Example call: lmer model with random effects by YEAR
#mlm<-lmer(DV~V1+V2+V3+V2*V3+V4+V5+V6+V7+V8+V9+V10+V11+(1|YEAR), data=dat)
#Note: 5 years (5 random effects total)

#LMER call yields the following information:
coef<-as.matrix(c(-28037800,0.8368619,2816347,8681918,-414002.6,371010.7,-26580.84,80.17909,271.417,-239.1172,3.463785,-828326))

sigma<-as.matrix(rbind(c(1834279134971.21,-415.95,-114036304870.57,-162630699769.14,-23984428143.44,-94539802675.96,
                       -4666823087.67,-93751.98,1735816.34,-1592542.75,3618.67,14526547722.87),
                 c(-415.95,0.00,41.69,94.17,-8.94,-22.11,-0.55,0.00,0.00,0.00,0.00,-7.97),
                 c(-114036304870.57,41.69,12186704885.94,12656728536.44,-227877587.40,-2267464778.61,
                       -4318868.82,8909.65,-355608.46,338303.72,-321.78,-1393244913.64),
                 c(-162630699769.14,94.17,12656728536.44,33599776473.37,542843422.84,4678344700.91,-27441015.29,
                       12106.86,-225140.89,246828.39,-593.79,-2445378925.66),
                 c(-23984428143.44,-8.94,-227877587.40,542843422.84,32114305557.09,-624207176.98,-23072090.09,
                       2051.16,51800.37,-49815.41,-163.76,2452174.23),
                 c(-94539802675.96,-22.11,-2267464778.61,4678344700.91,-624207176.98,603769409172.72,90275299.55,
                       9267.90,208538.76,-209180.69,-304.18,-7519167.05),
                 c(-4666823087.67,-0.55,-4318868.82,-27441015.29,-23072090.09,90275299.55,82486186.42,-100.73,
                       15112.56,-15119.40,-1.34,-2476672.62),
                 c(-93751.98,0.00,8909.65,12106.86,2051.16,9267.90,-100.73,2.54,8.73,-10.15,-0.01,-1507.62),
                 c(1735816.34,0.00,-355608.46,-225140.89,51800.37,208538.76,15112.56,8.73,527.85,-535.53,-0.01,21968.29),
                 c(-1592542.75,0.00,338303.72,246828.39,-49815.41,-209180.69,-15119.40,-10.15,-535.53,545.26,0.01,-23262.72),
                 c(3618.67,0.00,-321.78,-593.79,-163.76,-304.18,-1.34,-0.01,-0.01,0.01,0.01,42.90),
                 c(14526547722.87,-7.97,-1393244913.64,-2445378925.66,2452174.23,-7519167.05,-2476672.62,-1507.62,21968.29,
                        -23262.72,42.90,229188496.83)))
#Error begins here:
betas<-rmvnorm(n=1000, mean=coef, sigma=sigma)
#rmvnorm breaks, Error returned:

Warning message: In sqrt(ev$values) : NaNs produced

When I Google the following search string: "rmvnorm, "Warning message: In sqrt(ev$values) : NaNs produced," I saw that: http://www.nickfieller.staff.shef.ac.uk/sheff-only/mvatasksols6-9.pdf On Page 4 that this error indicates "negative eigen values." Although, I have no idea conceptually or practically what a negative eigen value is or why that they would be produced in this instance.

The second search result: [http://www.r-tutor.com/r-introduction/basic-data-types/complex2 Indicates that this error arises because of an attempt to take the square root of -1, which is "not a complex value" (you cannot take the square root of -1).

The question remains, what is going on here with the random generation of the betas, and how can this be corrected?

sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit)

Using the following packages/versions mvtnorm_0.9-9994, lme4_1.1-5, Rcpp_0.10.3, Matrix_1.1-2-2, lattice_0.20-23

Was it helpful?

Solution

You have a huge range of scales in your eigenvalues:

range(eigen(sigma)$values)
## [1] -1.005407e-05  1.863477e+12

I prefer to use mvrnorm from the MASS package, just because it comes installed automatically with R. It also appears to be more robust:

set.seed(1001)
m <- MASS::mvrnorm(n=1000, mu=coef, Sigma=sigma)  ## works fine

edit: OP points out that using method="svd" with rmvnorm also works.

If you print the code for MASS::mvrnorm, or debug(MASS:mvrnorm) and step through it, you see that it uses

if (!all(ev >= -tol * abs(ev[1L]))) stop("'Sigma' is not positive definite")

(where ev is the vector of eigenvalues, in decreasing order, so ev[1] is the largest eigenvalue) to decide on the positive definiteness of the variance-covariance matrix. In this case ev[1L] is about 2e12, tol is 1e-6, so this would allow negative eigenvalues up to a magnitude of about 2e6. In this case the minimum eigenvalue is -1e-5, well within tolerance.

Farther down MASS::mvrnorm uses pmax(ev,0) -- that is, if it has decided that the eigenvalues are not below tolerance (i.e. it didn't fail the test above), it just truncates the negative values to zero, which should be fine for practical purposes.

If you insisted on using rmvnorm you could use Matrix::nearPD, which tries to force the matrix to be positive definite -- it returns a list which contains (among other things) the eigenvalues and the "positive-definite-ified" matrix:

m <- Matrix::nearPD(sigma)
range(m$eigenvalues)
## [1] 1.863477e+04 1.863477e+12

The eigenvalues computed from the matrix are not quite identical -- nearPD and eigen use slightly different algorithms -- but they're very close.

range(eigen(m$mat)$values)
## [1] 1.861280e+04 1.863477e+12

More generally,

  • Part of the reason for the huge range of eigenvalues might be predictor variables that are scaled very differently. It might be a good idea to scale your input data if possible to make the variances more similar to each other (i.e., it will make all of your numerical computations more stable) -- you can always rescale the values once you've generated them
  • It's also the case that when matrices are very close to singular (i.e. some eigenvalues are very close to zero), small numerical differences can change the sign of the eigenvalues. In particular, if you copy and paste the values, you might lose some precision and cause this problem. Using dput(vcov(fit)) or save(vcov(fit)) to save the variance-covariance matrix at full precision is safer.
  • if you have no idea what "positive definite" means you might want to read up about it. The Wikipedia articles on covariance matrices and positive definite matrices might be a little too technical for you to start with; this question on StackExchange is closer, but still a little technical. The next entry on my Google journey was this one, which looks about right.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top