logistf: Firth's Bias-Reduced Logistic Regression

Description

Implements Firth's bias-Reduced penalized-likelihood logistic regression.

Usage

logistf(
  formula,
  data,
  pl = TRUE,
  alpha = 0.05,
  control,
  plcontrol,
  modcontrol,
  firth = TRUE,
  init,
  weights,
  na.action,
  offset,
  plconf = NULL,
  flic = FALSE,
  model = TRUE,
  ...
)

Value

The object returned is of the class logistf and has the following attributes:

coefficients: the coefficients of the parameter in the fitted model.
alpha: the significance level (1- the confidence level) as specified in the input.
terms: the column names of the design matrix
var: the variance-covariance-matrix of the parameters.
df: the number of degrees of freedom in the model.
loglik: a vector of the (penalized) log-likelihood of the restricted and the full models.
iter: A vector of the number of iterations needed in the fitting process for the null and full model.
n: the number of observations.
y: the response-vector, i. e. 1 for successes (events) and 0 for failures.
formula: the formula object.
call: the call object.
terms: the model terms (column names of design matrix).
linear.predictors: a vector with the linear predictor of each observation.
predict: a vector with the predicted probability of each observation.
hat.diag: a vector with the diagonal elements of the Hat Matrix.
conv: the convergence status at last iteration: a vector of length 3 with elements: last change in log likelihood, max(abs(score vector)), max change in beta at last iteration.
method: depending on the fitting method 'Penalized ML' or Standard ML'.} \item{method.ci}{the method in calculating the confidence intervals, i.e. profile likelihood' or `Wald', depending on the argument pl and plconf.
ci.lower: the lower confidence limits of the parameter.
ci.upper: the upper confidence limits of the parameter.
prob: the p-values of the specific parameters.
pl.iter: only if pl==TRUE: the number of iterations needed for each confidence limit.
betahist: only if pl==TRUE: the complete history of beta estimates for each confidence limit.
pl.conv: only if pl==TRUE: the convergence status (deviation of log likelihood from target value, last maximum change in beta) for each confidence limit.
control: a copy of the control parameters.
modcontrol: a copy of the modcontrol parameters.
flic: logical, is TRUE if intercept was altered such that the predicted probabilities become unbiased while keeping all other coefficients constant. According to input of logistf.
model: if requested (the default), the model frame used.
na.action: information returned by model.frame on the special handling of NAs

Arguments

formula: A formula object, with the response on the left of the operator, and the model terms on the right. The response must be a vector with 0 and 1 or FALSE and TRUE for the outcome, where the higher value (1 or TRUE) is modeled. It is possible to include contrasts, interactions, nested effects, cubic or polynomial splines and all S features as well, e.g. Y ~ X1*X2 + ns(X3, df=4).
data: A data.frame where the variables named in the formula can be found, i. e. the variables containing the binary response and the covariates.
pl: Specifies if confidence intervals and tests should be based on the profile penalized log likelihood (pl=TRUE, the default) or on the Wald method (pl=FALSE).
alpha: The significance level (1-\(\alpha\) the confidence level, 0.05 as default).
control: Controls iteration parameter. Default is control= logistf.control()
plcontrol: Controls Newton-Raphson iteration for the estimation of the profile likelihood confidence intervals. Default is plcontrol= logistpl.control()
modcontrol: Controls additional parameter for fitting. Default is logistf.mod.control()
firth: Use of Firth's penalized maximum likelihood (firth=TRUE, default) or the standard maximum likelihood method (firth=FALSE) for the logistic regression. Note that by specifying pl=TRUE and firth=FALSE (and probably a lower number of iterations) one obtains profile likelihood confidence intervals for maximum likelihood logistic regression parameters.
init: Specifies the initial values of the coefficients for the fitting algorithm
weights: specifies case weights. Each line of the input data set is multiplied by the corresponding element of weights
na.action: a function which indicates what should happen when the data contain NAs
offset: a priori known component to be included in the linear predictor
plconf: specifies the variables (as vector of their indices) for which profile likelihood confidence intervals should be computed. Default is to compute for all variables.
flic: If TRUE, intercept is altered such that the predicted probabilities become unbiased while keeping all other coefficients constant (see Puhr et al, 2017)
model: If TRUE the corresponding components of the fit are returned.
...: Further arguments to be passed to logistf

Author

Georg Heinze and Meinhard Ploner

Details

logistf is the main function of the package. It fits a logistic regression model applying Firth's correction to the likelihood. The following generic methods are available for logistf's output object: print, summary, coef, vcov, confint, anova, extractAIC, add1, drop1, profile, terms, nobs, predict. Furthermore, forward and backward functions perform convenient variable selection. Note that anova, extractAIC, add1, drop1, forward and backward are based on penalized likelihood ratios.

References

Firth D (1993). Bias reduction of maximum likelihood estimates. Biometrika 80, 27-38. Heinze G, Schemper M (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine 21: 2409-2419.

Heinze G, Ploner M (2003). Fixing the nonconvergence bug in logistic regression with SPLUS and SAS. Computer Methods and Programs in Biomedicine 71: 181-187.

Heinze G, Ploner M (2004). Technical Report 2/2004: A SAS-macro, S-PLUS library and R package to perform logistic regression without convergence problems. Section of Clinical Biometrics, Department of Medical Computer Sciences, Medical University of Vienna, Vienna, Austria. http://www.meduniwien.ac.at/user/georg.heinze/techreps/tr2_2004.pdf

Heinze G (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statistics in Medicine 25: 4216-4226.

Puhr R, Heinze G, Nold M, Lusa L, Geroldinger A (2017). Firth's logistic regression with rare events: accurate effect estimates and predictions? Statistics in Medicine 36: 2302-2317.

Venzon DJ, Moolgavkar AH (1988). A method for computing profile-likelihood based confidence intervals. Applied Statistics 37:87-94.

Examples

Run this code

data(sex2)
fit<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sex2)
summary(fit)
nobs(fit)
drop1(fit)
plot(profile(fit,variable="dia"))
extractAIC(fit)

fit1<-update(fit, case ~ age+oc+vic+vicl+vis)
extractAIC(fit1)
anova(fit,fit1)

data(sexagg)
fit2<-logistf(case ~ age+oc+vic+vicl+vis+dia, data=sexagg, weights=COUNT)
summary(fit2)

# simulated SNP example
set.seed(72341)
snpdata<-rbind(
  matrix(rbinom(2000,2,runif(2000)*0.3),100,20),
  matrix(rbinom(2000,2,runif(2000)*0.5),100,20))
colnames(snpdata)<-paste("SNP",1:20,"_",sep="")
snpdata<-as.data.frame(snpdata)
snpdata$case<-c(rep(0,100),rep(1,100))

fitsnp<-logistf(data=snpdata, formula=case~1, pl=FALSE)
add1(fitsnp, scope=paste("SNP",1:20,"_",sep=""), data=snpdata)
fitf<-forward(fitsnp, scope = paste("SNP",1:20,"_",sep=""), data=snpdata)
fitf

Run the code above in your browser using DataLab