GendataLDA

Simulates a dataset that can be used to filter out features for ultrahigh-dimensional discriminant analysis.
The simulation is based on the balanced scenarios in Example 3.1 of Cui et al.(2015).
The simulated dataset has p numerical X-predictors and a categorical Y-response.

An implementation of popular screening methods that are commonly employed in ultra-high and high dimensional data.
Through this publicly available package, we provide a unified framework to carry out model-free screening procedures including
SIS (Fan and Lv (2008) <doi:10.1111/j.1467-9868.2008.00674.x>),
SIRS (Zhu et al. (2011)<doi:10.1198/jasa.2011.tm10563>),
DC-SIS (Li et al. (2012) <doi:10.1080/01621459.2012.695654>),
MDC-SIS (Shao and Zhang (2014) <doi:10.1080/01621459.2014.887012>),
Bcor-SIS (Pan et al. (2019) <doi:10.1080/01621459.2018.1462709>),
PC-Screen (Liu et al. (2020) <doi:10.1080/01621459.2020.1783274>),
WLS (Zhong et al.(2021) <doi:10.1080/01621459.2021.1918554>),
Kfilter (Mai and Zou (2015) <doi:10.1214/14-AOS1303>),
MVSIS (Cui et al. (2015) <doi:10.1080/01621459.2014.920256>),
PSIS (Pan et al. (2016) <doi:10.1080/01621459.2014.998760>),
CAS (Xie et al. (2020) <doi:10.1080/01621459.2019.1573734>),
CI-SIS (Cheng and Wang. (2023) <doi:10.1016/j.cmpb.2022.107269>) and
CSIS (Cheng et al. (2023) <doi:10.1007/s00180-023-01399-5>).

Xuewei Cheng

MFSIS

Model-Free Sure Independent Screening Procedures

Hong Wang

Liping Zhu

Wei Zhong

Hanpu Zhou

GendataLDA function

<dl><dt>n</dt>
<dd>Number of subjects in the dataset to be simulated. It will also equal to the
number of rows in the dataset to be simulated, because it is assumed that each
row represents a different independent and identically distributed subject.</dd>
<dt>p</dt>
<dd>Number of predictor variables (covariates) in the simulated dataset.
These covariates will be the features screened by model-free procedures.</dd>
<dt>R</dt>
<dd>A positive integer, number of outcome categories for multinomial (categorical) outcome Y.</dd>
<dt>error</dt>
<dd>The distribution of error term, you can choose "gaussian" to generate a normal
distribution of error or you choose "t" to generate a t distribution of error with degree=2.
"cauchy" is represent the error term with cauchy distribution.</dd>
<dt>style</dt>
<dd>The balance among categories in categorial data .</dd></dl>

Arguments

Xuewei Cheng <a href="/link/xwcheng%40hunnu.edu.cn?package=MFSIS&version=0.3.0" data-mini-rdoc="MFSIS::xwcheng@hunnu.edu.cn">xwcheng@hunnu.edu.cn</a>

Author

Generate simulation data (Categorial based on linear discriminant analysis model) — GendataLDA

<dl>

<dt>n</dt>
<dd>Number of subjects in the dataset to be simulated. It will also equal to the
number of rows in the dataset to be simulated, because it is assumed that each
row represents a different independent and identically distributed subject.</dd>


<dt>p</dt>
<dd>Number of predictor variables (covariates) in the simulated dataset.
These covariates will be the features screened by model-free procedures.</dd>


<dt>R</dt>
<dd>A positive integer, number of outcome categories for multinomial (categorical) outcome Y.</dd>


<dt>error</dt>
<dd>The distribution of error term, you can choose "gaussian" to generate a normal
distribution of error or you choose "t" to generate a t distribution of error with degree=2.
"cauchy" is represent the error term with cauchy distribution.</dd>


<dt>style</dt>
<dd>The balance among categories in categorial data .</dd>

</dl>

Xuewei Cheng <a href='mailto:xwcheng@hunnu.edu.cn'>xwcheng@hunnu.edu.cn</a>

GendataLDA: Generate simulation data (Categorial based on linear discriminant analysis model)

Description

Usage

Value

Arguments

Author

References

Examples