Learn R Programming

semTools (version 0.5-7)

fmi: Fraction of Missing Information.

Description

This function estimates the Fraction of Missing Information (FMI) for summary statistics of each variable, using either an incomplete data set or a list of imputed data sets.

Usage

fmi(data, method = "saturated", group = NULL, ords = NULL,
  varnames = NULL, exclude = NULL, return.fit = FALSE)

Value

fmi() returns a list with at least 2 of the following:

Covariances

A list of symmetric matrices: (1) the estimated/pooled covariance matrix, or a list of group-specific matrices (if applicable) and (2) a matrix of FMI, or a list of group-specific matrices (if applicable). Only available if method = "saturated". When method="cor", this element is replaced by Correlations.

Variances

The estimated/pooled variance for each numeric variable. Only available if method = "null" (otherwise, it is on the diagonal of Covariances).

Means

The estimated/pooled mean for each numeric variable.

Thresholds

The estimated/pooled threshold(s) for each ordered-categorical variable.

Arguments

data

Either a single data.frame with incomplete observations, or a list of imputed data sets.

method

character. If "saturated" or "sat" (default), the model used to estimate FMI is a freely estimated covariance matrix and mean vector for numeric variables, and/or polychoric correlations and thresholds for ordered categorical variables, for each group (if applicable). If "null", only means and variances are estimated for numeric variables, and/or thresholds for ordered categorical variables (i.e., covariances and/or polychoric/polyserial correlations are constrained to zero). See Details for more information.

group

character. The optional name of a grouping variable, to request FMI in each group.

ords

Optional character vector naming ordered-categorical variables, if they are not already stored as class ordered in data.

varnames

Optional character vector of variable names, to calculate FMI for a subset of variables in data. By default, all numeric and ordered= variables will be included, unless data= is a single incomplete data.frame, in which case only numeric variables can be used with FIML estimation. Other variable types will be removed.

exclude

Optional character vector naming variables to exclude from the analysis.

return.fit

logical. If TRUE, the fitted lavaan::lavaan or lavaan.mi::lavaan.mi model is returned, so FMI can be found from summary(..., fmi=TRUE).

Author

Mauricio Garnier Villarreal (Vrije Universiteit Amsterdam; m.garniervillarreal@vu.nl)

Terrence Jorgensen (University of Amsterdam; TJorgensen314@gmail.com)

Details

The function estimates a saturated model with lavaan::lavaan() for a single incomplete data set using FIML, or with lavaan.mi::lavaan.mi() for a list of imputed data sets. If method = "saturated", FMI will be estiamted for all summary statistics, which could take a lot of time with big data sets. If method = "null", FMI will only be estimated for univariate statistics (e.g., means, variances, thresholds). The saturated model gives more reliable estimates, so it could also help to request a subset of variables from a large data set.

References

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

Savalei, V. & Rhemtulla, M. (2012). On obtaining estimates of the fraction of missing information from full information maximum likelihood. Structural Equation Modeling, 19(3), 477--494. tools:::Rd_expr_doi("10.1080/10705511.2012.687669")

Wagner, J. (2010). The fraction of missing information as a tool for monitoring the quality of survey data. Public Opinion Quarterly, 74(2), 223--243. tools:::Rd_expr_doi("10.1093/poq/nfq007")

Examples

Run this code

HSMiss <- HolzingerSwineford1939[ , c(paste("x", 1:9, sep = ""),
                                      "ageyr","agemo","school")]
set.seed(12345)
HSMiss$x5 <- ifelse(HSMiss$x5 <= quantile(HSMiss$x5, .3), NA, HSMiss$x5)
age <- HSMiss$ageyr + HSMiss$agemo/12
HSMiss$x9 <- ifelse(age <= quantile(age, .3), NA, HSMiss$x9)

## calculate FMI (using FIML, provide partially observed data set)
(out1 <- fmi(HSMiss, exclude = "school"))
(out2 <- fmi(HSMiss, exclude = "school", method = "null"))
(out3 <- fmi(HSMiss, varnames = c("x5","x6","x7","x8","x9")))
(out4 <- fmi(HSMiss, method = "cor", group = "school")) # correlations by group

## significance tests in lavaan(.mi) object
out5 <- fmi(HSMiss, method = "cor", return.fit = TRUE)
summary(out5) # factor loading == SD, covariance = correlation

if(requireNamespace("lavaan.mi")){
  ## ordered-categorical data
  data(binHS5imps, package = "lavaan.mi")

  ## calculate FMI, using list of imputed data sets
  fmi(binHS5imps, group = "school")
}

Run the code above in your browser using DataLab