quark: Quark

Description

The quark function provides researchers with the ability to calculate and include component scores calculated by taking into account the variance in the original dataset and all of the interaction and polynomial effects of the data in the dataset.

Usage

quark(data, id, order = 1, silent = FALSE)

Arguments

data

The data frame is a required component for quark. In order for quark to process a data frame, it must not contain any factors or text-based variables. All variables must be in numeric format. Identifiers and dates can be left in the data; however, they will need to be identified under the id argument.

Identifiers and dates within the dataset will need to be acknowledged as quark cannot process these. Be acknowledging the the identifiers and dates as a vector of column numbers or variable names, quark will remove them from the data temporarily to complete its main processes. Among many potential issues of not acknowledging identifiers and dates are issues involved with imputation, product and polynomial effects, and principal component analysis.

order

Order is an optional argument provided by quark that can be used when the imputation procedures in mice fails. Under some circumstances, mice cannot calculate missing values due to issues with extreme missingness. Should an error present itself stating a failure due to not having any columns selected, incorporate the argument order=2 into the quark function in order to reorder the imputation method procedure. Otherwise, the order is defaulted to 1. Example to rerun quark after imputation failure, quark.list <- quark(data=yourdataframe,id=vectorofIDs,order=2).

silent

If FALSE, the details of the quark process are printed.

Value

The output value from using the quark function is a list. It will return a list with 7 components.

ID Columns

Is a vector of the identifier columns entered when running quark.

ID Variables

Is a subset of the dataset that contains the identifiers as acknowledged when running quark.

Used Data

Is a matrix / dataframe of the data provided by user as the basis for quark to process.

Imputed Data

Is a matrix / dataframe of the data after the multiple method imputation process.

Big Matrix

Is the expanded product and polynomial matrix.

Principal Components

Is the entire dataframe of principal components for the dataset. This dataset will have the same number of rows of the big matrix, but will have 1 less column (as is the case with principal component analyses).

Percent Variance Explained

Is a vector of the percent variance explained with each column of principal components.

Details

The quark function calculates these component scores by first filling in the data via means of multiple imputation methods and then expanding the dataset by aggregating the non-overlapping interaction effects between variables by calculating the mean of the interactions and polynomial effects. The multiple imputation methods include one of iterative sampling and group mean substitution and multiple imputation using a polytomous regression algorithm (mice). During the expansion process, the dataset is expanded to three times its normal size (in width). The first third of the dataset contains all of the original data post imputation, the second third contains the means of the polynomial effects (squares and cubes), and the final third contains the means of the non-overlapping interaction effects. A full principal componenent analysis is conducted and the individual components are retained. The subsequent combinequark function provides researchers the control in determining how many components to extract and retain. The function returns the dataset as submitted (with missing values) and the component scores as requested for a more accurate multiple imputation in subsequent steps.

References

Howard, W. J., Little, T. D., & Rhemtulla, M. (in press). Using principal component analysis (PCA) to obtain auxiliary variables for missing data estimation in large data sets. Multivariate Behavioral Research.

Examples

Run this code

# NOT RUN {
set.seed(123321)
library(lavaan)

dat <- HolzingerSwineford1939[,7:15]
misspat <- matrix(runif(nrow(dat) * 9) < 0.3, nrow(dat))
dat[misspat] <- NA
dat <- cbind(HolzingerSwineford1939[,1:3], dat)

quark.list <- quark(data = dat, id = c(1, 2))

final.data <- combinequark(quark = quark.list, percent = 80)
# }

Run the code above in your browser using DataLab