The psych package has been developed at Northwestern University to include functions most useful for personality and psychological research. Some of the functions (e.g., read.clipboard
, describe
, pairs.panels
, error.bars
) are useful for basic data entry and descriptive analyses. Use help(package="psych") for a list of all functions.
Psychometric applications include routines (fa
for principal axes (factor.pa
), minimum residual (minres: factor.minres
), and weighted least squares (link{factor.wls}
factor analysis as well as functions to do Schmid Leiman transformations (schmid
) to transform a hierarchical factor structure into a bifactor solution. Factor or components transformations to a target matrix include the standard Promax transformation (Promax
), a transformation to a cluster target, or to any simple target matrix (target.rot
) as well as the ability to call many of the GPArotation functions. Functions for determining the number of factors in a data matrix include Very Simple Structure (VSS
) and Minimum Average Partial correlation (MAP
). An alternative approach to factor analysis is Item Cluster Analysis (ICLUST
). Reliability coefficients alpha (score.items
, score.multiple.choice
), beta (ICLUST
) and McDonald's omega (omega
and omega.graph
) as well as Guttman's six estimates of internal consistency reliability (guttman
) and the six measures of Intraclass correlation coefficients (ICC
) discussed by Shrout and Fleiss are also available.
The score.items
, and score.multiple.choice
functions may be used to form single or multiple scales from sets of dichotomous, multilevel, or multiple choice items by specifying scoring keys.
Additional functions make for more convenient descriptions of item characteristics. Functions under development include 1 and 2 parameter Item Response measures.
A number of procedures have been developed as part of the Synthetic Aperture Personality Assessment (SAPA) project. These routines facilitate forming and analyzing composite scales equivalent to using the raw data but doing so by adding within and between cluster/scale item correlations. These functions include extracting clusters from factor loading matrices (factor2cluster
), synthetically forming clusters from correlation matrices (cluster.cor
), and finding multiple ((mat.regress
) and partial ((partial.r
) correlations from correlation matrices.
Functions to generate simulated data with particular structures include sim.circ
(for circumplex structures), sim.item
(for general structures) and sim.congeneric
(for a specific demonstration of congeneric measurement). The functions sim.congeneric
and sim.hierarchical
can be used to create data sets with particular structural properties. A more general form for all of these is sim.structural
for generating general structural models. These are discussed in more detail in the vignette (psych_for_sem).
Functions to apply various standard statistical tests include p.rep
and its variants for testing the probability of replication, r.con
for the confidence intervals of a correlation, and r.test
to test single, paired, or sets of correlations.
In order to study diurnal or circadian variations in mood, it is helpful to use circular statistics. Functions to find the circular mean (circadian.mean
), circular (phasic) correlations (circadian.cor
) and the correlation between linear variables and circular variables (circadian.linear.cor
) supplement a function to find the best fitting phase angle (cosinor
) for measures taken with a fixed period (e.g., 24 hours).
The most recent development version of the package is always available for download as a source file from the repository at
The psych package was originally a combination of multiple source files maintained at the read.clipboard
),
simple descriptive statistics (describe
), and splom plots combined with correlations (pairs.panels
, adapted from the help files of pairs). It is now a single package.
The VSS
routines allow for testing the number of factors (VSS
), showing plots (VSS.plot
) of goodness of fit, and basic routines for estimating the number of factors/components to extract by using the MAP
's procedure, the examining the scree plot (VSS.scree
) or comparing with the scree of an equivalent matrix of random numbers (VSS.parallel
).
In addition, there are routines for hierarchical factor analysis using Schmid Leiman tranformations (omega
, omega.graph
) as well as Item Cluster analysis (ICLUST
, ICLUST.graph
).
The more important functions in the package are for the analysis of multivariate data, with an emphasis upon those functions useful in scale construction of item composites.
When given a set of items from a personality inventory, one goal is to combine these into higher level item composites. This leads to several questions:
1) What are the basic properties of the data? describe
reports basic summary statistics (mean, sd, median, mad, range, minimum, maximum, skew, kurtosis, standard error) for vectors, columns of matrices, or data.frames. describe.by
provides descriptive statistics, organized by one or more grouping variables. pairs.panels
shows scatter plot matrices (SPLOMs) as well as histograms and the Pearson correlation for scales or items. error.bars
will plot variable means with associated confidence intervals. error.bars
will plot confidence intervals for both the x and y coordinates. corr.test
will find the significance values for a matrix of correlations.
2) What is the most appropriate number of item composites to form? After finding either standard Pearson correlations, or finding tetrachoric or polychoric correlations using a wrapper (poly.mat
) for John Fox's hetcor function, the dimensionality of the correlation matrix may be examined. The number of factors/components problem is a standard question of factor analysis, cluster analysis, or principal components analysis. Unfortunately, there is no agreed upon answer. The Very Simple Structure (VSS
) set of procedures has been proposed as on answer to the question of the optimal number of factors. Other procedures (VSS.scree
, VSS.parallel
, fa.parallel
, and MAP
) also address this question.
3) What are the best composites to form? Although this may be answered using principal components (principal
), principal axis (factor.pa
) or minimum residual (factor.minres
) factor analysis (all part of the fa
function) and to show the results graphically (fa.graph)
, it is sometimes more useful to address this question using cluster analytic techniques. (Some would argue that better yet is to use maximum likelihood factor analysis using factanal
from the stats package.) Previous versions of ICLUST
(e.g., Revelle, 1979) have been shown to be particularly successful at forming maximally consistent and independent item composites. Graphical output from ICLUST.graph
uses the Graphviz dot language and allows one to write files suitable for Graphviz. If Rgraphviz is available, these graphs can be done in R.
Graphical organizations of cluster and factor analysis output can be done using cluster.plot
which plots items by cluster/factor loadings and assigns items to that dimension with the highest loading.
4) How well does a particular item composite reflect a single construct? This is a question of reliability and general factor saturation. Multiple solutions for this problem result in (Cronbach's) alpha (alpha
, score.items
), (Revelle's) Beta (ICLUST
), and (McDonald's) omega
(both omega hierarchical and omega total). Additional reliability estimates may be found in the guttman
function.
5) For some applications, data matrices are synthetically combined from sampling different items for different people. So called Synthetic Aperture Personality Assessement (SAPA) techniques allow the formation of large correlation or covariance matrices even though no one person has taken all of the items. To analyze such data sets, it is easy to form item composites based upon the covariance matrix of the items, rather than original data set. These matrices may then be analyzed using a number of functions (e.g., cluster.cor
, factor.pa
, ICLUST
, principal
, mat.regress
, and factor2cluster
.
6) More typically, one has a raw data set to analyze. alpha
will report several reliablity estimates as well as item-whole correlations for items forming a single scale, score.items
will score data sets on multiple scales, reporting the scale scores, item-scale and scale-scale correlations, as well as coefficient alpha, alpha-1 and G6+. Using a `keys' matrix (created by make.keys
or by hand), scales can have overlapping or independent items. score.multiple.choice
scores multiple choice items or converts multiple choice items to dichtomous (0/1) format for other functions.
An additional set of functions generate simulated data to meet certain structural properties. sim.anova
produces data simulating a 3 way analysis of variance (ANOVA) or linear model with or with out repeated measures. sim.item
creates simple structure data, sim.circ
will produce circumplex structured data, sim.dichot
produces circumplex or simple structured data for dichotomous items. These item structures are useful for understanding the effects of skew, differential item endorsement on factor and cluster analytic soutions. sim.structural
will produce correlation matrices and data matrices to match general structural models. (See the vignette).
When examining personality items, some people like to discuss them as representing items in a two dimensional space with a circumplex structure. Tests of circumplex fit circ.tests
have been developed. When representing items in a circumplex, it is convenient to view them in polar
coordinates.
Additional functions for testing the difference between two independent or dependent correlation r.test
, to find the phi
or Yule
coefficients from a two by table, or to find the confidence interval of a correlation coefficient.
Ten data sets are included: bfi
represents 25 personality items thought to represent five factors of personality, iqitems
has 14 multiple choice iq items. sat.act
has data on self reported test scores by age and gender. galton
Galton's data set of the heights of parents and their children. peas
recreates the original Galton data set of the genetics of sweet peas. heights
and
cubits
provide even more Galton data, vegetables
provides the Guilford preference matrix of vegetables. cities
provides airline miles between 11 US cities (demo data for multidimensional scaling).
psych A package for personality, psychometric, and psychological research.
Useful data entry and descriptive statistics
Data reduction through cluster and factor analysis
ICLUST
Apply the ICLUST algorithm
ICLUST.graph Graph the output from ICLUST using the dot language
ICLUST.rgraph Graph the output from ICLUST using rgraphviz
poly.mat Find the polychoric correlations for items (uses J. Fox's hetcor)
omega Calculate the omega estimate of factor saturation (requires the GPArotation package)
omega.graph Draw a hierarchical or Schmid Leiman orthogonalized solution (uses Rgraphviz)
schmid Apply the Schmid Leiman transformation to a correlation matrix
score.items Combine items into multiple scales and find alpha
score.multiple.choice Combine items into multiple scales and find alpha and basic scale statistics
smc Find the Squared Multiple Correlation (used for initial communality estimates)
VSS Apply the Very Simple Structure criterion to determine the appropriate number of factors.
VSS.parallel Do a parallel analysis to determine the number of factors for a random matrix
VSS.plot Plot VSS output
VSS.scree Show the scree plot of the factor/principal components
MAP Apply the Velicer Minimum Absolute Partial criterion for number of factors
}
Functions for reliability analysis (some are listed above as well).
Procedures particularly useful for Synthetic Aperture Personality Assessment
Functions for generating simulated data sets
Graphical functions (require Rgraphviz)
Circular statistics (for circadian data analysis)
Miscellaneous functions
Functions that are under development and not recommended for casual use
Data sets included in the psych package
A debugging function that may also be used as a demonstration of psych.
Revelle, W. (in preparation) An Introduction to Psychometric Theory with applications in R. Springer. at
#See the separate man pages
test.psych()
Run the code above in your browser using DataLab