missing value where TRUE/FALSE needed in R [duplicate]

https://stackoverflow.com/questions/23209430

07-07-2023
|

Question

When I run the following code without commenting gr.ascent(MMSE, 0.5, verbose=TRUE) I receive this error Error in b1 * x : 'b1' is missing but when I comment that line I receive the following error when testing MMSE with these arguments MMSE(2,1,farmland$farm,farmland$area). Do you know where my problem is lying?

Error in if (abs(t[i]) <= k) { : missing value where TRUE/FALSE needed

Here's my code:

farmland <- read.csv("FarmLandArea.csv")
str(farmland)
fit=lm(farm~land,data=farmland)
mean.squared.residuals <- sum((lm(farm~land,data=farmland)$residuals)^2)/(length(farmland$farm)-2)

#gradient descent

#things I should possibly use: solve(t(X)%*%X, t(X)%*%y)
gr.ascent<- function(df, x0, alpha=0.2, eps=0.001, max.it = 50, verbose = FALSE){
  X1 <- x0
  cond <- TRUE
  iteration <- 0
  if(verbose) cat("X0 =",X1,"\n")
  while(cond){
    iteration <- iteration + 1
    X0 <- X1
    X1 <- X0 + alpha * df(X0)
    cond <- sum((X1 - X0)^2) > eps & iteration < max.it
    if(verbose) cat(paste(sep="","X",iteration," ="), X1, "\n")
  }
  return(X1)
}


k=19000

#rho <- function(t, k=19000){
#  for (i in seq(1,length(t))){
#    if (abs(t[i]) <= k)
#      return(t[i]^2)
#    else 
#      return(2*k*abs(t[i])-k^2)

#  }

#}

#nicer implementation of rho. ifelse works on vector
rho<-function(t,k) ifelse(abs(t)<=k,t^2,(2*k*abs(t))-k^2)
rho.prime <- function(t, k=19000){
  out <- rep(NA, length(t))
  for (i in seq(1,length(t))){
    if (abs(t[i]) <= k)
    { print(2*t[i])
      out[i] <- 2*t[i] 
    }
    else 
    {
      print(2*k*sign(t[i]))
      out[i] <- 2*k*sign(t[i])
    }
  }
  return(out)
}
MMSE <- function(b0, b1, y=farmland$farm, x=farmland$land){
   # Calls rho.prime() here with argument y-b0-b1*x


   #Why should we call rho.prime? in the html page you have used rho!?
  n = length(y)
  total = 0
  for (i in seq(1,n)) {
    #total = total + rho(t,k)*(y[i]-b0-b1*x[i])
    total = total + rho.prime(y-b0-b1*x,k)*(y[i]-b0-b1*x[i])
  }
  return(total/n)
}

gr.ascent(MMSE(1,2), 0.5, verbose=TRUE)

In which FarmLand csv data is like the following:

state,land,farm
Alabama,50744,14062
Alaska,567400,1375
Arizona,113635,40781
Arkansas,52068,21406
California,155959,39688
Colorado,103718,48750
Connecticut,4845,625
Delaware,1954,766
Florida,53927,14453
Georgia,57906,16094
Hawaii,6423,1734
Idaho,82747,17812
Illinois,55584,41719
Indiana,35867,23125
Iowa,55869,48125
Kansas,81815,72188
Kentucky,39728,21875
Louisiana,43562,12578
Maine,30862,2109
Maryland,9774,3203
Massachusetts,7840,812
Michigan,58110,15625
Minnesota,79610,42031
Mississippi,46907,17422
...

Here's the result of dput(farmland):

> dput(farmland)
structure(list(state = structure(1:50, .Label = c("Alabama", 
"Alaska", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", 
"Delaware", "Florida", "Georgia", "Hawaii", "Idaho", "Illinois", 
"Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", 
"Maryland", "Massachusetts", "Michigan", "Minnesota", "Mississippi", 
"Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", 
"New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", 
"Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", 
"South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", 
"Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", 
"Wyoming"), class = "factor"), land = c(50744L, 567400L, 113635L, 
52068L, 155959L, 103718L, 4845L, 1954L, 53927L, 57906L, 6423L, 
82747L, 55584L, 35867L, 55869L, 81815L, 39728L, 43562L, 30862L, 
9774L, 7840L, 58110L, 79610L, 46907L, 68886L, 145552L, 76872L, 
109826L, 8968L, 7417L, 121356L, 47214L, 48711L, 68976L, 40948L, 
68667L, 95997L, 44817L, 1045L, 30109L, 75885L, 41217L, 261797L, 
82144L, 9250L, 39594L, 66544L, 24230L, 54310L, 97105L), farm = c(14062L, 
1375L, 40781L, 21406L, 39688L, 48750L, 625L, 766L, 14453L, 16094L, 
1734L, 17812L, 41719L, 23125L, 48125L, 72188L, 21875L, 12578L, 
2109L, 3203L, 812L, 15625L, 42031L, 17422L, 45469L, 95000L, 71250L, 
9219L, 734L, 1141L, 67500L, 10938L, 13438L, 61875L, 21406L, 55000L, 
25625L, 12109L, 109L, 7656L, 68281L, 17031L, 203750L, 17344L, 
1906L, 12578L, 23125L, 5703L, 23750L, 47188L)), .Names = c("state", 
"land", "farm"), class = "data.frame", row.names = c(NA, -50L
))

Solution

OK, by the numbers:

In your call to gr.ascent(...) you pass a function, MMSE as the first argument. Inside gr.ascent(...) you refer to this function as df(...).
The function MMSE(...) has 2 arguments, b0 and b1 for which there are no defaults - so these must be specified or there will be an error, but
When you call the function df(...) inside gr.ascent(...), in the line: X1 <- X0 + alpha * df(X0) you pass only 1 argument, which is b0.
So the second argument, b1 is missing, hence the error.

When you call MMSE(...) directly, as in:

MMSE(2,1,farmland$farm,farmland$area)

you pass farmland$area as the 4th argument. But there is no column area in the farmland data frame! So this gets passed as NA, which, when used in

total = total + rho.prime(y-b0-b1*x,k)*(y[i]-b0-b1*x[i])

coerces the t argument to rho.prime(...) to NA, hence the second error.

I can't suggest a solution because I have no clue what you are trying to accomplish here.

EDIT (Response to OP's comment).

Notwithstanding @thelatemail's comment, which I agree with wholeheartedly, your new error is rather obscure.

In your earlier version, you were passing a function, MSEE(...) to gr.ascent(...), and using it incorrectly. This time, you are passing a value to gr.ascent(...), that value being the return value when you call MSEE(1,2). So what happens when you try to treat this value as a function, as in:

X1 <- X0 + alpha * df(X0)

Well, normally this would throw an error telling you that df is not a function. In this case, it's just your bad luck that df is a function. It is the probability density function for the F distribution, which has a required argument df1, among others (type ?df to see the documentation). That's why you are getting the error.

To "fix" this you need to go back to passing the function, as in:

gr.ascent(MSEE,...)

and then use it correctly inside gr.ascent(...), as in:

X1 <- X0 + alpha * df(X0, <some other argument>).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow