imputeCA: Impute contingency table

Description

Impute the missing entries of a contingency table using Correspondence Analysis (CA). Can be used as a preliminary step before performing CA on an incomplete dataset.

Usage

imputeCA(X, ncp = 2, threshold = 1e-08, maxiter = 1000)

Arguments

a data.frame that is a contingency table containing missing values

ncp

integer corresponding to the number of dimensions used to predict the missing entries

threshold

the threshold for assessing convergence

maxiter

integer, maximum number of iterations for the regularized iterative CA algorithm

Value

The imputed contingency table; the observed values are kept for the non-missing entries and the missing values are replaced by the predicted ones.

Details

Impute the missing entries of a contingency table using a regularized CA algorithm. The (regularized) iterative CA algorithm first consists in initializing missing values with random initial values. The second step of the (regularized) iterative CA algorithm consists in performing CA on the completed dataset. Then, it imputes the missing values with the (regularized) reconstruction formulae of order ncp (the fitted matrix computed with ncp components for the (regularized) scores and loadings). These steps of estimation of the parameters via CA and imputation of the missing values using the (regularized) fitted matrix are iterate until convergence. In this regularized algorithm, the singular values of the CA are shrinked. The number of components ncp used in the algorithm should be small. A small number of components can also be seen as a way to regularize more and consequently may be advices to get more stable predictions.

The output of the algorithm can be used as an input of the CA function of the FactoMineR package in order to perform CA on an incomplete dataset.

Examples

Run this code

# NOT RUN {
data(children)

## Impute the indicator matrix and perform a CA
res.impute <- imputeCA(children, ncp=2)
res.ca <- CA(res.impute) 
# }

Run the code above in your browser using DataLab