Learn R Programming

cglasso (version 2.0.2)

datacggm: Create a Dataset from a Conditional Gaussian Graphical Model with Censored and/or Missing Values

Description

‘The datacggm’ function is used to create a dataset from a conditional Gaussian graphical model with censored and/or missing values.

Usage

datacggm(Y, lo = -Inf, up =  +Inf, X = NULL, control = list(maxit = 1.0E+4,
         thr = 1.0E-4))

Arguments

Y

a \((n\times p)\)-dimensional matrix; each row is an observation from a conditional Gaussian graphical model with censoring vectors lo and up. Missing-at-random values are recorded as ‘NA’.

lo

the lower censoring vector; lo[j] is used to specify the lower censoring value for the random variable \(Y_j\).

up

the upper censoring vector; up[j] is used to specify the upper censoring value for the random variable \(Y_j\).

X

an optional \((n\times q)\)-dimensional matrix of predictors. If missing (default), a dataset from a Gaussian graphical model is returned otherwise a dataset from a conditional Gaussian graphical model is returned.

control

a named list used to pass the arguments to the EM algorithm (see below for more details). The components are:

  • maxit: maximum number of iterations. Default is 1.0E+4.

  • thr: threshold for the convergence. Default value is 1.0E-4.

Value

datacggm’ returns an R object of S3 class “datacggm”, that is, a nested named list containing the following components:

Y

the \((n\times p)\)-dimensional matrix Y.

X

the \((n\times q)\)-dimensional matrix X.

Info

  • lo: the lower censoring vector;

  • up: the upper censoring vector;

  • R: the status indicator matrix encoding the censored/missing values (mainly for internal purposes);

  • order: an integer vector used for the ordering of the matrices Y and X (for internal purposes only);

  • Pattern: a matrix encoding the information about the the patterns of censored/missing values (for internal purposes only);

  • ym: the estimated marginal means of the random variables \(Y_j\);

  • yv: the estimated marginal variances of the random variables \(Y_j\);

  • n: the sample size;

  • p: the number of response variables;

  • q: the number of predictors.

Details

The function ‘datacggm’ returns an R object of class ‘datacggm’, that is a named list containing the elements needed to fit a conditional graphical LASSO (cglasso) model to datasets with censored and/or missing values.

A set of specific method functions are developed to decsribe data with censored/missing values. For example, the method function ‘print.datacggm’ prints out the left and right-censored values using the following rules: a right-censored value is labeled adding the symbol ‘+’ at the end of the value, whereas the symbol ‘-’ is used for the left-censored values (see examples below). The summary statistics can be obtained using the method function ‘summary.datacggm’. The matrices Y and X are extracted from a datacggm object using the function ‘getMatrix’.

For each column of the matrix ‘Y’, mean and variance are estimated using a standard EM-algorithm based on the assumption of a Gaussian distribution. ‘maxit’ and ‘thr’ are used to set the number of iterations and the threshold for convergence, respectively. Marginal means and variances can be extracted using the accessor functions ‘ColMeans’ and ‘ColVars’, respectively. Furthermore, the plotting functions ‘hist.datacggm’ and ‘qqcnorm’ can be used to inspect the marginal distribution of each column of the matrix ‘Y’.

The status indicator matrix, denoted by R, can be extracted by using the function event. The entries of this matrix specify the status of an observation using the following code:

  • R[i, j] = 0’ means that the \(y_{ij}\) is inside the open interval (lo[j], up[j]);

  • R[i, j] = -1’ means that the \(y_{ij}\) is a left-censored value;

  • R[i, j] = +1’ means that the \(y_{ij}\) is a right-censored value;

  • R[i, j] = +9’ means that the \(y_{ij}\) is a missing value.

See below for the other functions related to an object of class ‘datacggm’.

References

Augugliaro, L., Sottile, G., and Vinciotti, V. (2020a) <10.1007/s11222-020-09945-7>. The conditional censored graphical lasso estimator. Statistics and Computing 30, 1273--1289.

Augugliaro, L., Abbruzzo, A., and Vinciotti, V. (2020b) <10.1093/biostatistics/kxy043>. \(\ell_1\)-Penalized censored Gaussian graphical model. Biostatistics 21, e1--e16.

See Also

Related to the R objects of class “datacggm” there are the accessor functions, rowNames, colNames, getMatrix, ColMeans, ColVars, upper, lower, event, qqcnorm and the method functions is.datacggm, dim.datacggm, summary.datacggm and hist.datacggm. The function rcggm can be used to simulate a dataset from a conditional Gaussian graphical model whereas cglasso is the model fitting function devoted to the l1-penalized censored Gaussian graphical model.

Examples

Run this code
# NOT RUN {
set.seed(123)

# a dataset from a right-censored Gaussian graphical model
n <- 100L
p <- 3L
Y <- matrix(rnorm(n * p), n, p)
up <- 1
Y[Y >= up] <- up
Z <- datacggm(Y = Y, up = up)
Z

# a dataset from a  conditional censored Gaussian graphical model
n <- 100L
p <- 3L
q <- 2
Y <- matrix(rnorm(n * p), n, p)
up <- 1
lo <- -1
Y[Y >= up] <- up
Y[Y <= lo] <- lo
X <- matrix(rnorm(n * q), n, q)
Z <- datacggm(Y = Y, lo = lo, up = up, X = X)
Z

# a dataset from a  conditional censored Gaussian graphical model 
# and with missing-at-random values
n <- 100L
p <- 3L
q <- 2
Y <- matrix(rnorm(n * p), n, p)
NA.id <- matrix(rbinom(n * p, 1L, 0.01), n, p)
Y[NA.id == 1L] <- NA
up <- 1
lo <- -1
Y[Y >= up] <- up
Y[Y <= lo] <- lo
X <- matrix(rnorm(n * q), n, q)
Z <- datacggm(Y = Y, lo = lo, up = up, X = X)
Z
# }

Run the code above in your browser using DataLab