calf_subset: calf_subset

Description

Runs Coarse Approximation Linear Function on a random subset of the data provided, resulting in the same proportion applied to case and control, when applicable.

Usage

calf_subset(
  data,
  nMarkers,
  proportion = 0.8,
  targetVector,
  times = 1,
  optimize = "pval",
  verbose = FALSE
)

Arguments

data

Matrix or data frame. First column must contain case/control dummy coded variable (if targetVector = "binary"). Otherwise, first column must contain real number vector corresponding to selection variable (if targetVector = "nonbinary"). All other columns contain relevant markers.

nMarkers

Maximum number of markers to include in creation of sum.

proportion

Numeric. A value between 0 and 1 indicating the proportion of cases and controls to use in analysis (if targetVector = "binary"). If targetVector = "nonbinary", this is just a proportion of the full sample. Used to evaluate robustness of solution. Defaults to 0.8.

targetVector

Indicate "binary" for target vector with two options (e.g., case/control). Indicate "nonbinary" for target vector with real numbers.

times

Numeric. Indicates the number of replications to run with randomization.

optimize

Criteria to optimize if targetVector = "binary." Indicate "pval" to optimize the p-value corresponding to the t-test distinguishing case and control. Indicate "auc" to optimize the AUC.

verbose

Logical. Indicate TRUE to print activity at each iteration to console. Defaults to FALSE.

Value

A data frame containing the chosen markers and their assigned weight (-1 or 1)

The optimal AUC, pval, or correlation for the classification. If multiple replications are requested, a data.frame containing all optimized values across all replications is returned.

aucHist A histogram of the AUCs across replications, if applicable.

Examples

Run this code

# NOT RUN {
calf_subset(data = CaseControl, nMarkers = 6, targetVector = "binary", times = 5)
# }

Run the code above in your browser using DataLab