pairwiseCount: Count number of pairwise cases for a data set with missing (NA) data and impute values.

Description

When doing cor(x, use= "pairwise"), it is nice to know the number of cases for each pairwise correlation. This is particularly useful when doing SAPA type analyses. More importantly, when there are some missing pairs, it is useful to supply imputed values so that further analyses may be done. This is useful if using the Massively Missing Completely at Random (MMCAR) designs used by the SAPA project. The specific pairs missing may be identified by pairwiseZero. Summaries of the counts are given by pairwiseDescribe.

Usage

pairwiseCount(x, y = NULL,diagonal=TRUE)
pairwiseDescribe(x,y,diagonal=FALSE,...) 
pairwiseZero(x,y=NULL, min=0, short=TRUE)
pairwiseImpute(keys,R,fix=FALSE)
pairwiseReport(x,y=NULL,cut=0,diagonal=FALSE,...) 
pairwiseSample(x,y=NULL,diagonal=FALSE,size=100,...)
pairwiseCountBig(x,size=NULL)
pairwisePlot(x,y=NULL,upper=TRUE,diagonal=TRUE,labels=TRUE,show.legend=TRUE,n.legend=10,
colors=FALSE,gr=NULL,minlength=6,xlas=1,ylas=2,
main="Relative Frequencies",count=TRUE,...)
count.pairwise(x, y = NULL,diagonal=TRUE) #deprecated

Value

result: = matrix of counts of pairwise observations (if pairwiseCount)
av.r: The average correlation value of the observed correlations within/between scales
count: The numer of observed correlations within/between each scale
percent: The percentage of complete data by scale
imputed: The original correlation matrix with NA values replaced by the mean correlation for items within/between the appropriate scale.

Arguments

x: An input matrix, typically a data matrix ready to be correlated.
y: An optional second input matrix
diagonal: if TRUE, then report the diagonal, else fill the diagonals with NA
...: Other parameters to pass to describe
min: Count the number of item pairs with <= min entries
short: Show the table of the item pairs that have entries <= min
keys: A keys.list specifying which items belong to which scale.
R: A correlation matrix to be described or imputed
cut: Report the item pairs and numbers with cell sizes less than cut
fix: If TRUE, then replace all NA correlations with the mean correlation for that within or between scale
upper: Should the upper off diagonal matrix be drawn, or left blank?
labels: if NULL, use column and row names, otherwise use labels
show.legend: A legend (key) to the colors is shown on the right hand side
n.legend: How many categories should be labelled in the legend?
colors: Defaults to FALSE and will use a grey scale. colors=TRUE use colors \ from the colorRampPalette from red through white to blue
minlength: If not NULL, then the maximum number of characters to use in row/column labels
xlas: Orientation of the x axis labels (1 = horizontal, 0, parallel to axis, 2 perpendicular to axis)
ylas: Orientation of the y axis labels (1 = horizontal, 0, parallel to axis, 2 perpendicular to axis)
main: A title. Defaults to "Relative Frequencies"
gr: A color gradient: e.g., gr <- colorRampPalette(c("#B52127", "white", "#2171B5")) will produce slightly more pleasing (to some) colors. See next to last example of corPlot.
count: Should we count the number of pairwise observations using pairwiseCount, or just plot the counts for a matrix?
size: Sample size of the number of variables to sample in pairwiseSample

Author

Maintainer: William Revelle revelle@northwestern.edu

Details

When using Massively Missing Completely at Random (MMCAR) designs used by the SAPA project, it is important to count the number of pairwise observations (pairwiseCount). If there are pairs with 1 or fewer observations, these will produce NA values for correlations making subsequent factor analyses fa or reliability analsyes omega or scoreOverlap impossible.

pairwiseCountBig may be used to count the cells of large data sets. It is analogous to bigCor and returns the cell sizes for each pair of correlations.

In order to identify item pairs with counts less than a certain value pairwiseReport reports the names of those pairs with fewer than 'cut' observations. By default, it just reports the number of offending items. With short=FALSE, the print will give the items with n.obs < cut. Even more detail is available in the returned objects.

The specific pairs that have values <= n min in any particular table of the paiwise counts may be given by pairwiseZero.

To remedy the problem of missing correlations, we impute the missing correlations using pairwiseImpute. The technique takes advantage of the scale based structure of SAPA items. Items within a scale (e.g. Letter Number Series similar to the ability items) are imputed to correlate with items from another scale (e.g., Matrix Reasoning) at the average of these two between scale inter-item mean correlations. The average correlations within and between scales are reported by pairwiseImpute and if the fix paremeter is specified, the imputed correlation matrix is returned.

Alternative methods of imputing these correlations are not yet implemented.

The time to count cell size varies linearly by the number of subjects and of the number of items squared. This becomes prohibitive for larger (big n items) data sets. pairwiseSample will take samples of size=size of these bigger data sets and then returns basic descriptive statistics of these counts, including mean, median, and the .05, .25, .5, .75 and .95 quantiles.

Examples

Run this code


x <- matrix(rnorm(900),ncol=6)
y <- matrix(rnorm(450),ncol=3)
x[x < 0] <- NA
y[y > 1] <- NA

pairwiseCount(x)
pairwiseCount(y)
pairwiseCount(x,y)
pairwiseCount(x,diagonal=FALSE)
pairwiseDescribe(x,quant=c(.1,.25,.5,.75,.9))

#examine the structure of the ability data set
keys <- list(ICAR16=colnames(psychTools::ability),reasoning =  
  cs(reason.4,reason.16,reason.17,reason.19),
  letters=cs(letter.7, letter.33,letter.34,letter.58, letter.7), 
  matrix=cs(matrix.45,matrix.46,matrix.47,matrix.55), 
  rotate=cs(rotate.3,rotate.4,rotate.6,rotate.8))
 pairwiseImpute(keys,psychTools::ability)

Run the code above in your browser using DataLab