Yes, you can sum the log-likelihoods for the two groups (if they were calculated separately). Just like you would sum the log-likelihoods for a vector of observations, where each observation has different generative parameters.
I prefer to think in terms of one large vector (ie, of the shape parameter) which contains values that vary according to the structure of the covariates (ie, group membership). In a linear model context, this vector could equal the linear predictor (once appropriately transformed by link function): the dot product of the design matrix and the vector of regression coefficients.
Here is a (non-functionalized) example:
## setup true values
nobs = 50 ## number of observations
a1 = 1 ## shape for first group
b1 = 2 ## scale parameter for both groups
beta = c(a1, a1 * 1.5) ## vector of linear coefficients (group shapes)
## model matrix for full, null models
mm_full = cbind(grp1 = rep(c(1,0), each = nobs), grp2 = rep(c(0,1), each = nobs))
mm_null = cbind(grp1 = rep(1, nobs*2))
## shape parameter vector for the full, null models
shapes_full = mm_full %*% beta ## different shape parameters by group (full model)
shapes_null = mm_null %*% beta[1] ## same shape parameter for all obs
scales = rep(b1, length(shapes_full)) ## scale parameters the same for both groups
## simulate response from full model
response = rweibull(length(shapes_full), shapes_full, scales)
## the log likelihood for the full, null models:
LL_full = sum(dweibull(response, shapes_full, scales, log = T))
LL_null = sum(dweibull(response, shapes_null, scales, log = T))
## likelihood ratio test
LR_test = function(LL_null, LL_full, df) {
LR = -2 * (LL_null - LL_full) ## test statistic
pchisq(LR, df = df, ncp = 0, lower = F) ## probability of test statistic under central chi-sq distribution
}
LR_test(LL_null, LL_full, 1) ## 1 degrees freedom (1 parameter added)
To write a log-likelihood function to find the MLE of a Weibull model where the shape parameter(s) are some linear function of covariates, you could use the same approach:
## (negative) log-likelihood function
LL_weibull = function(par, data, mm, inv_link_fun = function(.) .){
P = ncol(mm) ## number of regression coefficients
N = nrow(mm) ## number of observations
shapes = inv_link_fun(mm %*% par[1:P]) ## shape vector (possibly transformed)
scales = rep(par[P+1], N) ## scale vector
-sum(dweibull(data, shape = shapes, scale = scales, log = T)) ## negative log likelihood
}
Then your power simulation might look like this:
## function to simulate data, perform LRT
weibull_sim = function(true_shapes, true_scales, mm_full, mm_null){
## simulate response
response = rweibull(length(true_shapes), true_shapes, true_scales)
## find MLE
mle_full = optim(par = rep(1, ncol(mm_full)+1), fn = LL_weibull, data = response, mm = mm_full)
mle_null = optim(par = rep(1, ncol(mm_null)+1), fn = LL_weibull, data = response, mm = mm_null)
## likelihood ratio test
df = ncol(mm_full) - ncol(mm_null)
return(LR_test(-mle_null$value, -mle_full$value, df))
}
## run simulations
nsim = 1000
pvals = sapply(1:nsim, function(.) weibull_sim(shapes_full, scales, mm_full, mm_null) )
## calculate power
alpha = 0.05
power = sum(pvals < alpha) / nsim
An identity link works fine in the above example, but for more complex models some sort of transformation might be required.
And you don't have to use linear algebra in the log-likelihood function -- obviously, you can construct the vector of shapes in any way you see fit (as long as you explicitly index the appropriate generative parameters in the vector par
).
Interval-censored data
The cumulative distribution function F(T) of the Weibull distribution (pweibull
in R) gives the probability of failure before time T. So,
if an observation is interval censored between times T[0] and T[1], the probability that the object fails between T[0] and T[1] is F(T[1]) - F(T[0]) :
the probability that the object failed before T[1] minus the probability that it failed before T[0] (the integral of the PDF between T[0] and T[1]).
Because the Weibull CDF is already implemented in R it is trivial to modify the likelihood function above:
LL_ic_weibull <- function(par, data, mm){
## 'data' has two columns, left and right times of censoring interval
P = ncol(mm) ## number of regression coefficients
shapes = mm %*% par[1:P]
scales = par[P+1]
-sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
}
Or if you don't want to use a model matrix, etc., and just restrict yourself to indexing the shape parameter vector by groups, you could do something like:
LL_ic_weibull2 <- function(par, data, nobs){
## 'data' has two columns, left and right times of censoring interval
## 'nobs' is a vector that contains the num. observations for each group (grp1, grp2, ...)
P = length(nobs) ## number of regression coefficients
shapes = rep(par[1:P], nobs)
scales = par[P+1]
-sum(log(pweibull(data[,2], shape = shapes, scale = scales) - pweibull(data[,1], shape = shapes, scale = scales)))
}
Test that both functions give the same solution:
## generate intervals from simulated response (above)
left = ifelse(response - 0.2 < 0, 0, response - 0.2)
right = response + 0.2
response_ic = cbind(left, right)
## find MLE w/ first LL function (model matrix)
mle_ic_full = optim(par = c(1,1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_full)
mle_ic_null = optim(par = c(1,3), fn = LL_ic_weibull, data = response_ic, mm = mm_null)
## find MLE w/ second LL function (groups only)
nobs_per_group = apply(mm_full, 2, sum) ## just contains number of observations per group
nobs_one_group = nrow(mm_null) ## one group so only one value
mle_ic_full2 = optim(par = c(1,1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_per_group)
mle_ic_null2 = optim(par = c(1,3), fn = LL_ic_weibull2, data = response_ic, nobs = nobs_one_group)