fusionbinary: Fusion learning algorithm for binary responses

Description

fusionbinary conducts the group penalization with a specified penalty value learning from multiple generalized linear models with binary responses. fusionbinary.fit can be used to search the best candidate model based on the pseudo Bayesian information criterion with a sequence of penalty values.

Usage

fusionbinary(x, y, lambda, N, p, m, beta=0.1, thresh=0.1, 
             maxiter=100, methods="scad", link="logit", Complete=TRUE)
fusionbinary.fit(x, y, lambda, N, p, m, beta=0.1, thresh=0.1, 
                 maxiter=100, methods="scad", link="logit", Complete=TRUE, 
                 depen ="IND", a=1)

Arguments

List. Listing matrices of the predictors from different platforms.

List. A list of binary responses vectors from different platforms following the same order as in x.

lambda

Numeric or vector. For fusionbinary, lambda is a numeric value for the penalty; for fusionbinary.fit, lambda is a vector with a list of penalty values.

Numeric or vector. If only one numeric value is provided, equal sample size will be assumed for each data set. If a vector is provided, then the elements are the sample sizes for all the platforms.

Numeric. The number of predictors.

Numeric. The number of platforms.

beta

Numeric. An initial value for the estimated parameters with dimensions nvars x nplatforms.

thresh

Numeric. The stopping criteria. The default value is 0.1.

maxiter

Numeric. Maximum number of iterations. The default value is 100.

methods

Character ("lass" or "scad"). lass: LASSO; scad: SCAD.

link

Character ("logit" or "probit"). Link functions: logistic or probit.

Complete

Logic input. If Complete == TRUE, the predictors \(M_1\),...,\(M_p\) are measured in all platforms. If Compelte == FALSE, in some platforms, not all of the predictors \(\{M_1,M_2,...,M_p\}\) are measured. The values of the corresponding estimated coefficients for the missing predictors will be NA.

depen

Character. Input only for function fusionbinary.fit. "IND" means the observations across different platforms are independent; "CORR" means the observations are correlated, and the sample sizes should be equal for different platforms.

Numeric. Input only for function fusionbinary.fit. The free multiplicative constant used in \(\gamma_n\). The default value is 1.

Value

fusionbinary returns a list that has components:

beta

A matrix (nvars x nplatforms) containing estimated coefficients of each linear model. If some data sets do not have the complete set of predictors, the corresponding coefficients are output as NA.

method

Penalty function LASSO or SCAD.

link

The link function used in the estimation.

threshold

The numeric value shows the difference in the estimates between the successive updates upon convergence.

iteration

The numeric value shows the number of iterations upon convergence.

fusionbinary.fit provides the results in a table:

lambda

The sequence of penalty values.

BIC

The pseudolikelihood Bayesian information criterion evaluated at the sequence of the penalty values.

-2Loglkh

Minus twice the pseudo loglikelihood of the chosen model.

Est_df

The estimated degrees of freedom quantifying the model complexity.

fusionbinary.fit also returns a model selection plot showing the results above.

Details

The generalized fusion learning function to learn from multiple models with binary responses. More details regarding the algorithm can be found in FusionLearn.

References

Gao, X and Carroll, R. J. (2017) Data integration with high dimensionality. Biometrika, 104, 2, pp. 251-272

Examples

Run this code

# NOT RUN {
##Analysis of the gene data 
y = list(mockgene1[,2],mockgene2[,2])           ## responses "status"
x = list(mockgene1[,3:502],mockgene2[,3:502])   ## 500 predictors 


##Implementing fusion learning algorithm 
result <- fusionbinary(x,y,0.3,N=c(98,286),500,2) 
id <- which(result$beta[,1]!=0)+2
genename <- colnames(mockgene1)[id]

# }

Run the code above in your browser using DataLab