Question

I am learning R and currently using it for non linear regression (which I am also learning). I have two sets of data (duration of an operation on different machines) and I am able to find a good non linear regression for each of these sets. Now, I would like to find the best regression that minimise the sum of both residual sum-of-squares.

Here is what I have :

A <- c(1:5)
B <- c(100, 51, 32, 24, 19)
C <- c(150, 80, 58, 39, 29)

df <- data.frame (A,B,C)

f <- B ~ k1/A + k2
g <- C ~ k1/A + k2

n <- nls(f, data = df, start = list(k1=10, k2=10))
p <- nls(g, data = df, start = list(k1=10, k2=10))
n
#Nonlinear regression model
#  model: B ~ k1/A + k2
#   data: df
#     k1      k2 
#101.595  -1.195 
# residual sum-of-squares: 2.619

#Number of iterations to convergence: 1 
#Achieved convergence tolerance: 2.568e-07

p
#Nonlinear regression model
#  model: C ~ k1/A + k2
#   data: df
#     k1      k2 
#148.044   3.593 
# residual sum-of-squares: 54.19

#Number of iterations to convergence: 1 
#Achieved convergence tolerance: 1.803e-07

k1 and k2 constant are (of course) different for both sets (B and C), I am wondering how I could manage to find a particular k1 and a particular k2 that produce the 'best' solution for both data sets.

Hope my explanation will be understandable. Otherwise, what I'm trying to find is sometimes (at least here) called global non linear regression.

EDIT : I would also like to know how can I tell R to avoid negative values for a specific parameter. In this case, I would like k2 to be positive.

Was it helpful?

Solution

If you want identical parameters, you should just pool your data:

df2 <- data.frame(Y=c(df$B,df$C), X=rep(df$A, 2))
p <- nls(Y ~ k1/X + k2, 
         data = df2, 
         start = list(k1=10, k2=10), 
         lower = c(0, 0), 
         algorithm = "port")
summary(p)

#  Formula: Y ~ k1/X + k2
#  
#  Parameters:
#    Estimate Std. Error t value Pr(>|t|)    
#  k1  124.819     18.078   6.904 0.000124 ***
#    k2    1.199      9.781   0.123 0.905439    
#  ---
#    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#  
#  Residual standard error: 16.59 on 8 degrees of freedom
#  
#  Algorithm "port", convergence message: both X-convergence and relative convergence (5)

Edit:

If you want one parameter to be equal and one to vary, you could use a mixed effects model. However, I don't know how to specify constraints for that (I believe that is not a simple task, but could possibly be achieved by reparameterization).

library(nlme)

library(reshape2)
df3 <- melt(df, id.vars="A")

r <- nlme(value ~ k1/A + k2, 
          data = df3, 
          start = c(k1=10, k2=10), 
          fixed = k1 + k2 ~1,
          random = k2 ~ 1|variable)

summary(r)
#  Nonlinear mixed-effects model fit by maximum likelihood
#  Model: value ~ k1/A + k2 
#  Data: df3 
#  AIC      BIC    logLik
#  83.11052 84.32086 -37.55526
#  
#  Random effects:
#    Formula: k2 ~ 1 | variable
#                k2 Residual
#  StdDev: 12.49915 7.991013
#  
#  Fixed effects: k1 + k2 ~ 1 
#         Value Std.Error DF   t-value p-value
#  k1 124.81916  9.737738  7 12.818086  0.0000
#  k2   1.19925 11.198211  7  0.107093  0.9177
#  Correlation: 
#         k1    
#  k2 -0.397
#  
#  Standardized Within-Group Residuals:
#    Min         Q1        Med         Q3        Max 
#  -1.7520706 -0.5273469  0.2746039  0.5235343  1.4971808 
#  
#  Number of Observations: 10
#  Number of Groups: 2 

coef(r)
#          k1        k2
#  B 124.8192 -10.81835
#  C 124.8192  13.21684
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top