simulate functional compositional data.
Fcomp_Model(n, p, m = 0, intercept = TRUE,
interval = c(0, 1), n_T = 100, obs_spar = 0.6, discrete = FALSE,
SNR = 1, sigma = 2, Nzero_group = 4,
rho_X, Corr_X = c("CorrCS", "CorrAR"),
rho_T, Corr_T = c("CorrAR", "CorrCS"),
range_beta = c(0.5, 1), beta_c = 1, beta_C ,
theta.add = c(1, 2, 5, 6), gamma = 0.5,
basis_beta = c("bs", "OBasis", "fourier"), df_beta = 5, degree_beta = 3,
insert = c("FALSE", "X", "basis"), method = c("trapezoidal", "step"))
sample size.
number of the components in the functional compositional data.
size of unpenalized variables.
The first ceiling(m/2)
ones are generated with independent bin(1,0.5)
entries;
while the last (m - ceiling(m/2))
ones are generated with independent norm(0, 1)
entries.
Default is 0.
whether to include an intercept.
Default is TRUE
.
a vector of length 2 indicating the time domain. Default is c(0, 1)
.
an integer specifying length of the equally spaced time sequence on domian interval
.
a percentage used to get sparse ovbservation. Each time point is with
probability obs_spar
to be observed. It allows different subject to be observed on
different time points.
obs_spar * n_T > 5
is required.
logical (default is FALSE
) specifying whether the functional compositional data
\(X\) is generated at different time points.
If distrete = TRUE
, generate \(X\) on dense sequence created by
max(ns_dense = 200 * diff(interval), 5 * n_T)
and then for
each subject, randomly sample n_T
points.
signal to noise ratio.
variance used to generate the covariance matrix
CovMIX = sigma^2 * kronecker(T.Sigma, X.Sigma)
.
The "non-normalized" data \(w_{i}\) for each subject is
genearted from multivariate normal distribution with covariance CovMIX
.
T.Sigma
and X.Sigma
are correlation matrices for
time points and components, respectively.
an even integer specifying that the first Nzero_group
compositional predictors
are with non-zero effects. Default is 4.
parameters used to generate correlation matrices.
character string specifying correlation structure bewteen components and between time points, respectively.
"CorrCS"
(Default for Corr_X
) compound symmetry.
"CorrAR"
(Default for Corr_T
) autoregressive.
a sorted vector of length 2, specifying the range of coefficient
matrix B
of demension \(p \times k\).
Specifically, each column of B
is filled with Nzero_group/2
values
from the unifom distribution over range_beta
and their negative counterparts.
Default is c(0.5, 1)
.
value of coefficients for beta0 and beta_c (coefficients for intercept and time-invariant predictors). Default is 1.
vectorized coefficient matrix.
If missing, the program will generate beta_C
according to range_beta
and Nzero_group
.
logical or integer(s).
If integer(s), a vector with value(s) in [1,p]
,
indicating which component(s) of compostions is of high
level mean curve.
If TRUE
,
the components c(1:ceiling(Nzero_group/2)
and
Nzero_group + (1:ceiling(Nzero_group/2)))
are set to with high level mean.
if FALSE
, all mean curves are set to 0's.
for the high-level mean groups, log(p * gamma) is added on the "non-normalized" data \(w_{i}\) before the data are converted to be compositional.
basis_fun
, k
and degree
in FuncompCGL
respectively.
a character string sepcifying method to perform functional interpolation.
"FALSE"
(Default) no interpolation.
"X"
linear interpolation of functional compositional
data along the time grid.
"basis"
the functional compositional data is interplolated
as a step function along the time grid.
If insert
= "X"
or "basis"
, interplolation is conducted
on sseq
, where sseq
is the sorted sequence of all the observed time points.
a character string sepcifying method used to approximate integral.
"trapezoidal"
(Default) Sum up areas under the trapezoids.
"step"
Sum up area under the rectangles.
a list including
a list of observed data,
y
a vector of response variable,
Comp
a data frame of observed functional compositional
data, a column of Subject_ID
, and a
column of TIME
,
Zc
a matrix of unpenalized variables with dimension
\(n \times m\),
intercept
whether an intercept
is included.
a length p*df_beta + m + 1
vector of coefficients
matrix of the basis function to generate the coefficient curves
a list consisting of
Z_t.full
the functional compositional data.
Z_ITG
integrated functional compositional data.
Y.tru
true response vector without noise.
X
functional "non-normalized" data W
.
a list of parameters used in the simulation.
The setup of this simulation follows Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics.
Specifically, we first generate correlation matrix X.sigma
for components of a composition
based on rho_X
and Corr_X
, and correlation matrix T.sigma
for time points based on rho_T
and Corr_T
. Then, the "non-normalized"
data \(w_i=[w_i(t_1)^T,...,w_i(t_{n_T})^T]\)
for each subject are generated from multivariate normal
distrubtion with covariance CovMIX = sigma^2 * kronecker(T.Sigma, X.Sigma)
, and
the mean vector is determined by theta.add
and gamma
.
Each \(w_i(t_v)\) is a p
-vector for each time point \(v =1,...,T_n\).
Finally, the compositional data are obtained as
$$
x_{ij}(t_v) = exp(w_{ij}(t_v))/sum_{k=1}^{p} exp(w_{ik}(t_v)),
$$
for each subject \(i=1,...,n\), component of a composition \(j=1,...,p\)
and time point \(v=1,...,n_T\).
Sun, Z., Xu, W., Cong, X., Li G. and Chen K. (2020) Log-contrast regression with functional compositional predictors: linking preterm infant's gut microbiome trajectories to neurobehavioral outcome, https://arxiv.org/abs/1808.02403 Annals of Applied Statistics
# NOT RUN {
Data <- Fcomp_Model(n = 50, p = 30, m = 0, intercept = TRUE, Nzero_group = 4,
n_T = 20, SNR = 3, rho_X = 0, rho_T = 0.6,
df_beta = 5, obs_spar = 1, theta.add = FALSE)
# }
Run the code above in your browser using DataLab