Simulates a dataset that can be used to filter out features for ultrahigh-dimensional discriminant analysis. The simulation is based on the balanced scenarios in Example 3.1 of Cui et al.(2015). The simulated dataset has p numerical X-predictors and a categorical Y-response.
GendataLDA(
n,
p,
R = 3,
error = c("gaussian", "t", "cauchy"),
style = c("balanced", "unbalanced")
)
the list of your simulation data
Number of subjects in the dataset to be simulated. It will also equal to the number of rows in the dataset to be simulated, because it is assumed that each row represents a different independent and identically distributed subject.
Number of predictor variables (covariates) in the simulated dataset. These covariates will be the features screened by model-free procedures.
A positive integer, number of outcome categories for multinomial (categorical) outcome Y.
The distribution of error term, you can choose "gaussian" to generate a normal distribution of error or you choose "t" to generate a t distribution of error with degree=2. "cauchy" is represent the error term with cauchy distribution.
The balance among categories in categorial data .
Xuewei Cheng xwcheng@hunnu.edu.cn
Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630-641.
n <- 100
p <- 200
R <- 3
data <- GendataLDA(n, p, R, error = "gaussian", style = "balanced")
Run the code above in your browser using DataLab