Overview of the psych package.
The psych package has been developed at Northwestern University to include functions most useful for personality and psychological research. Some of the functions (e.g., read.file
, read.clipboard
, describe
, pairs.panels
, error.bars
and error.dots
) are useful for basic data entry and descriptive analyses. Use help(package="psych") or objects("package:psych") for a list of all functions. Two vignettes are included as part of the package. The intro vignette tells how to install psych and overview vignette provides examples of using psych in many applications. In addition, there are a growing set of tutorials available on the https://personality-project.org/r/ webpages.
A companion package psychTools
includes larger data set examples and four more vignette.
Psychometric applications include routines (fa
for maximum likelihood (fm="mle"), minimum residual (fm="minres"), minimum rank (fm=minrank) principal axes (fm="pa") and weighted least squares (fm="wls") factor analysis as well as functions to do Schmid Leiman transformations (schmid
) to transform a hierarchical factor structure into a bifactor solution. Principal Components Analysis (pca
) is also available. Rotations may be done using factor or components transformations to a target matrix include the standard Promax transformation (Promax
), a transformation to a cluster target, or to any simple target matrix (target.rot
) as well as the ability to call many of the GPArotation functions (e.g., oblimin, quartimin, varimax, geomin, ...). Functions for determining the number of factors in a data matrix include Very Simple Structure (VSS
) and Minimum Average Partial correlation (MAP
).
An alternative approach to factor analysis is Item Cluster Analysis (ICLUST
). This function is particularly appropriate for exploratory scale construction.
There are a number of functions for finding various reliability coefficients (see Revelle and Condon, 2019). These include the traditional alpha
(found for multiple scales and with more useful output by scoreItems
, score.multiple.choice
), beta (ICLUST
) and both of McDonald's omega coefficients (omega
, omegaSem
and omega.diagram
) as well as Guttman's six estimates of internal consistency reliability (guttman
) and the six measures of Intraclass correlation coefficients (ICC
) discussed by Shrout and Fleiss are also available.
Multilevel analyses may be done by statsBy
and multilevel.reliability
.
The scoreItems
, and score.multiple.choice
functions may be used to form single or multiple scales from sets of dichotomous, multilevel, or multiple choice items by specifying scoring keys. scoreOverlap
correct interscale correlations for overlapping items, so that it is possible to examine hierarchical or nested structures.
Scales can be formed that best predict (after cross validation) particular criteria using bestScales
using unit weighted or correlation weights. This procedure, also called the BISCUIT
algorithm (Best Items Scales that are Cross validated, Unit weighted, Informative, and Transparent) is a simple alternative to more complicated supervised machine learning algorithms.
Additional functions make for more convenient descriptions of item characteristics include 1 and 2 parameter Item Response measures. The tetrachoric
, polychoric
and irt.fa
functions are used to find 2 parameter descriptions of item functioning. scoreIrt
, scoreIrt.1pl
and scoreIrt.2pl
do basic IRT based scoring.
A number of procedures have been developed as part of the Synthetic Aperture Personality Assessment (SAPA https://www.sapa-project.org/) project. These routines facilitate forming and analyzing composite scales equivalent to using the raw data but doing so by adding within and between cluster/scale item correlations. These functions include extracting clusters from factor loading matrices (factor2cluster
), synthetically forming clusters from correlation matrices (cluster.cor
), and finding multiple ((lmCor
) and partial ((partial.r
) correlations from correlation matrices.
If forming empirical scales, or testing out multiple regressions, it is important to cross validate the results. crossValidation
will do this on a different data set.
lmCor
and mediate
meet the desire to do regressions and mediation analysis from either raw data or from correlation matrices. If raw data are provided, these functions can also do moderation analyses.
Functions to generate simulated data with particular structures include sim.circ
(for circumplex structures), sim.item
(for general structures) and sim.congeneric
(for a specific demonstration of congeneric measurement). The functions sim.congeneric
and sim.hierarchical
can be used to create data sets with particular structural properties. A more general form for all of these is sim.structural
for generating general structural models. These are discussed in more detail in the vignette (psych_for_sem).
Functions to apply various standard statistical tests include p.rep
and its variants for testing the probability of replication, r.con
for the confidence intervals of a correlation, and r.test
to test single, paired, or sets of correlations.
In order to study diurnal or circadian variations in mood, it is helpful to use circular statistics. Functions to find the circular mean (circadian.mean
), circular (phasic) correlations (circadian.cor
) and the correlation between linear variables and circular variables (circadian.linear.cor
) supplement a function to find the best fitting phase angle (cosinor
) for measures taken with a fixed period (e.g., 24 hours).
A dynamic model of personality and motivation (the Cues-Tendency-Actions model) is include as cta
.
A number of useful helper functions allow for data input (read.file
), and data manipulation cs
and dfOrder
,
The most recent development version of the package is always available for download as a source file from the repository at the PMC lab:
install.packages("psych", repos = "https://personality-project.org/r/", type="source").
This will provide the most recent version for PCs and Macs.
William Revelle
Two vignettes (intro.pdf and scoring.pdf) are useful introductions to the package. They may be found as vignettes in R or may be downloaded from https://personality-project.org/r/psych/intro.pdf https://personality-project.org/r/psych/overview.pdf and https://personality-project.org/r/psych/psych_for_sem.pdf. In addition, there are a number of "HowTo"s available at https://personality-project.org/r/
The more important functions in the package are for the analysis of multivariate data, with an emphasis upon those functions useful in scale construction of item composites. However, there are a number of very useful functions for basic data manipulation including
read.file
, read.clipboard
, describe
, pairs.panels
, error.bars
and error.dots
) which are useful for basic data entry and descriptive analyses.
When given a set of items from a personality inventory, one goal is to combine these into higher level item composites. This leads to several questions:
1) What are the basic properties of the data? describe
reports basic summary statistics (mean, sd, median, mad, range, minimum, maximum, skew, kurtosis, standard error) for vectors, columns of matrices, or data.frames. describeBy
provides descriptive statistics, organized by one or more grouping variables. statsBy
provides even more detail for data structured by groups including within and between correlation matrices, ICCs for group differences, as well as basic descriptive statistics organized by group.
pairs.panels
shows scatter plot matrices (SPLOMs) as well as histograms and the Pearson correlation for scales or items. error.bars
will plot variable means with associated confidence intervals. errorCircles
will plot confidence intervals for both the x and y coordinates. corr.test
will find the significance values for a matrix of correlations. error.dots
creates a dot chart with confidence intervals.
2) What is the most appropriate number of item composites to form? After finding either standard Pearson correlations, or finding tetrachoric or polychoric correlations, the dimensionality of the correlation matrix may be examined. The number of factors/components problem is a standard question of factor analysis, cluster analysis, or principal components analysis. Unfortunately, there is no agreed upon answer. The Very Simple Structure (VSS
) set of procedures has been proposed as on answer to the question of the optimal number of factors. Other procedures (VSS.scree
, VSS.parallel
, fa.parallel
, and MAP
) also address this question. nfactors
combine several of these approaches into one convenient function. Unfortunately, there is no best answer to the problem.
3) What are the best composites to form? Although this may be answered using principal components (principal
, aka pca
), principal axis (factor.pa
) or minimum residual (factor.minres
) factor analysis (all part of the fa
function) and to show the results graphically (fa.diagram)
, it is sometimes more useful to address this question using cluster analytic techniques. Previous versions of ICLUST
(e.g., Revelle, 1979) have been shown to be particularly successful at forming maximally consistent and independent item composites. Graphical output from ICLUST.graph
uses the Graphviz dot language and allows one to write files suitable for Graphviz. If Rgraphviz is available, these graphs can be done in R.
Graphical organizations of cluster and factor analysis output can be done using cluster.plot
which plots items by cluster/factor loadings and assigns items to that dimension with the highest loading.
4) How well does a particular item composite reflect a single construct? This is a question of reliability and general factor saturation. Multiple solutions for this problem result in (Cronbach's) alpha (alpha
, scoreItems
), (Revelle's) Beta (ICLUST
), and (McDonald's) omega
(both omega hierarchical and omega total). Additional reliability estimates may be found in the guttman
function.
This can also be examined by applying irt.fa
Item Response Theory techniques using factor analysis of the tetrachoric
or polychoric
correlation matrices and converting the results into the standard two parameter parameterization of item difficulty and item discrimination. Information functions for the items suggest where they are most effective.
5) For some applications, data matrices are synthetically combined from sampling different items for different people. So called Synthetic Aperture Personality Assessement (SAPA) techniques allow the formation of large correlation or covariance matrices even though no one person has taken all of the items. To analyze such data sets, it is easy to form item composites based upon the covariance matrix of the items, rather than original data set. These matrices may then be analyzed using a number of functions (e.g., cluster.cor
, fa
, ICLUST
, pca
, mat.regress
, and factor2cluster
.
6) More typically, one has a raw data set to analyze. alpha
will report several reliablity estimates as well as item-whole correlations for items forming a single scale, score.items
will score data sets on multiple scales, reporting the scale scores, item-scale and scale-scale correlations, as well as coefficient alpha, alpha-1 and G6+. Using a `keys' matrix (created by make.keys
or by hand), scales can have overlapping or independent items. score.multiple.choice
scores multiple choice items or converts multiple choice items to dichtomous (0/1) format for other functions.
If the scales have overlapping items, then scoreOverlap
will give similar statistics, but correcting for the item overlap.
7) The reliability
function combines the output from several different ways to estimate reliability including omega
and splitHalf
.
8) In addition to classical test theory (CTT) based scores of either totals or averages, 1 and 2 parameter IRT based scores may be found with scoreIrt.1pl
, scoreIrt.2pl
or more generally scoreIrt
. Although highly correlated with CTT estimates, these scores take advantage of different item difficulties and are particularly appropriate for the problem of missing data.
9) If the data has a multilevel structure (e.g, items nested within time nested within subjects) the multilevel.reliability
aka mlr
function will estimate generalizability coefficients for data over subjects, subjects over time, etc. mlPlot
will provide plots for each subject of items over time. mlArrange
takes the conventional wide output format and converts it to the long format necessary for some multilevel functions. Other functions useful for multilevel data include statsBy
and faBy
.
An additional set of functions generate simulated data to meet certain structural properties. sim.anova
produces data simulating a 3 way analysis of variance (ANOVA) or linear model with or with out repeated measures. sim.item
creates simple structure data, sim.circ
will produce circumplex structured data, sim.dichot
produces circumplex or simple structured data for dichotomous items. These item structures are useful for understanding the effects of skew, differential item endorsement on factor and cluster analytic soutions. sim.structural
will produce correlation matrices and data matrices to match general structural models. (See the vignette).
When examining personality items, some people like to discuss them as representing items in a two dimensional space with a circumplex structure. Tests of circumplex fit circ.tests
have been developed. When representing items in a circumplex, it is convenient to view them in polar
coordinates.
Additional functions for testing the difference between two independent or dependent correlation r.test
, to find the phi
or Yule
coefficients from a two by table, or to find the confidence interval of a correlation coefficient.
Many data sets are included: bfi
represents 25 personality items thought to represent five factors of personality, ability
has 14 multiple choice iq items. sat.act
has data on self reported test scores by age and gender. galton
Galton's data set of the heights of parents and their children. peas
recreates the original Galton data set of the genetics of sweet peas. heights
and
cubits
provide even more Galton data, vegetables
provides the Guilford preference matrix of vegetables. cities
provides airline miles between 11 US cities (demo data for multidimensional scaling).
Partial Index (to see the entire index, see the link at the bottom of every help page)
psych A package for personality, psychometric, and psychological research.
Useful data entry and descriptive statistics
read.file | search for, find, and read from file |
read.clipboard | shortcut for reading from the clipboard |
read.clipboard.csv | shortcut for reading comma delimited files from clipboard |
read.clipboard.lower | shortcut for reading lower triangular matrices from the clipboard |
read.clipboard.upper | shortcut for reading upper triangular matrices from the clipboard |
describe | Basic descriptive statistics useful for psychometrics |
describe.by | Find summary statistics by groups |
statsBy | Find summary statistics by a grouping variable, including within and between correlation matrices. |
mlArrange | Change multilevel data from wide to long format |
headtail | combines the head and tail functions for showing data sets |
pairs.panels | SPLOM and correlations for a data matrix |
corr.test | Correlations, sample sizes, and p values for a data matrix |
cor.plot | graphically show the size of correlations in a correlation matrix |
multi.hist | Histograms and densities of multiple variables arranged in matrix form |
skew | Calculate skew for a vector, each column of a matrix, or data.frame |
kurtosi | Calculate kurtosis for a vector, each column of a matrix or dataframe |
geometric.mean | Find the geometric mean of a vector or columns of a data.frame |
harmonic.mean | Find the harmonic mean of a vector or columns of a data.frame |
error.bars | Plot means and error bars |
error.bars.by | Plot means and error bars for separate groups |
error.crosses | Two way error bars |
interp.median | Find the interpolated median, quartiles, or general quantiles. |
rescale | Rescale data to specified mean and standard deviation |
table2df | Convert a two dimensional table of counts to a matrix or data frame |
Data reduction through cluster and factor analysis
fa | Combined function for principal axis, minimum residual, weighted least squares, |
and maximum likelihood factor analysis | |
factor.pa | Do a principal Axis factor analysis (deprecated) |
factor.minres | Do a minimum residual factor analysis (deprecated) |
factor.wls | Do a weighted least squares factor analysis (deprecated) |
fa.graph | Show the results of a factor analysis or principal components analysis graphically |
fa.diagram | Show the results of a factor analysis without using Rgraphviz |
fa.sort | Sort a factor or principal components output |
fa.extension | Apply the Dwyer extension for factor loadingss |
principal | Do an eigen value decomposition to find the principal components of a matrix |
fa.parallel | Scree test and Parallel analysis |
fa.parallel.poly | Scree test and Parallel analysis for polychoric matrices |
factor.scores | Estimate factor scores given a data matrix and factor loadings |
guttman | 8 different measures of reliability (6 from Guttman (1945) |
irt.fa | Apply factor analysis to dichotomous items to get IRT parameters |
iclust | Apply the ICLUST algorithm |
ICLUST.diagram | The base R graphics output function called by iclust |
ICLUST.graph | Graph the output from ICLUST using the dot language |
ICLUST.rgraph | Graph the output from ICLUST using rgraphviz |
kaiser | Apply kaiser normalization before rotating |
reliability | A wrapper function to find alpha, omega, split half. etc. |
polychoric | Find the polychoric correlations for items and find item thresholds |
poly.mat | Find the polychoric correlations for items (uses J. Fox's hetcor) |
omega | Calculate the omega estimate of factor saturation (requires the GPArotation package) |
omega.graph | Draw a hierarchical or Schmid Leiman orthogonalized solution (uses Rgraphviz) |
partial.r | Partial variables from a correlation matrix |
predict | Predict factor/component scores for new data |
schmid | Apply the Schmid Leiman transformation to a correlation matrix |
scoreItems | Combine items into multiple scales and find alpha |
score.multiple.choice | Combine items into multiple scales and find alpha and basic scale statistics |
scoreOverlap | Find item and scale statistics (similar to score.items) but correct for item overlap |
lmCor | Find Cohen's set correlation between two sets of variables (see also lmCor for the latest version) |
smc | Find the Squared Multiple Correlation (used for initial communality estimates) |
tetrachoric | Find tetrachoric correlations and item thresholds |
polyserial | Find polyserial and biserial correlations for item validity studies |
mixed.cor | Form a correlation matrix from continuous, polytomous, and dichotomous items |
VSS | Apply the Very Simple Structure criterion to determine the appropriate number of factors. |
VSS.parallel | Do a parallel analysis to determine the number of factors for a random matrix |
VSS.plot | Plot VSS output |
VSS.scree | Show the scree plot of the factor/principal components |
MAP | Apply the Velicer Minimum Absolute Partial criterion for number of factors |
Functions for reliability analysis (some are listed above as well).
alpha | Find coefficient alpha and Guttman Lambda 6 for a scale (see also score.items) |
guttman | 8 different measures of reliability (6 from Guttman (1945) |
omega | Calculate the omega estimates of reliability (requires the GPArotation package) |
omegaSem | Calculate the omega estimates of reliability using a Confirmatory model (requires the sem package) |
ICC | Intraclass correlation coefficients |
score.items | Combine items into multiple scales and find alpha |
glb.algebraic | The greates lower bound found by an algebraic solution (requires Rcsdp). Written by Andreas Moeltner |
Procedures particularly useful for Synthetic Aperture Personality Assessment
alpha | Find coefficient alpha and Guttman Lambda 6 for a scale (see also score.items) |
bestScales | A bootstrap aggregation function for choosing most predictive unit weighted items |
make.keys | Create the keys file for score.items or cluster.cor |
correct.cor | Correct a correlation matrix for unreliability |
count.pairwise | Count the number of complete cases when doing pair wise correlations |
cluster.cor | find correlations of composite variables from larger matrix |
cluster.loadings | find correlations of items with composite variables from a larger matrix |
eigen.loadings | Find the loadings when doing an eigen value decomposition |
fa | Do a minimal residual or principal axis factor analysis and estimate factor scores |
fa.extension | Extend a factor analysis to a set of new variables |
factor.pa | Do a Principal Axis factor analysis and estimate factor scores |
factor2cluster | extract cluster definitions from factor loadings |
factor.congruence | Factor congruence coefficient |
factor.fit | How well does a factor model fit a correlation matrix |
factor.model | Reproduce a correlation matrix based upon the factor model |
factor.residuals | Fit = data - model |
factor.rotate | ``hand rotate" factors |
guttman | 8 different measures of reliability |
lmCor | standardized multiple regression from raw or correlation matrix input Formerly called lmCor |
mat.regress | standardized multiple regression from raw or correlation matrix input |
polyserial | polyserial and biserial correlations with massive missing data |
tetrachoric | Find tetrachoric correlations and item thresholds |
Functions for generating simulated data sets
sim | The basic simulation functions |
sim.anova | Generate 3 independent variables and 1 or more dependent variables for demonstrating ANOVA |
and lm designs | |
sim.circ | Generate a two dimensional circumplex item structure |
sim.item | Generate a two dimensional simple structure with particular item characteristics |
sim.congeneric | Generate a one factor congeneric reliability structure |
sim.minor | Simulate nfact major and nvar/2 minor factors |
sim.structural | Generate a multifactorial structural model |
sim.irt | Generate data for a 1, 2, 3 or 4 parameter logistic model |
sim.VSS | Generate simulated data for the factor model |
phi.demo | Create artificial data matrices for teaching purposes |
sim.hierarchical | Generate simulated correlation matrices with hierarchical or any structure |
sim.spherical | Generate three dimensional spherical data (generalization of circumplex to 3 space) |
Graphical functions (require Rgraphviz) -- deprecated
structure.graph | Draw a sem or regression graph |
fa.graph | Draw the factor structure from a factor or principal components analysis |
omega.graph | Draw the factor structure from an omega analysis(either with or without the Schmid Leiman transformation) |
ICLUST.graph | Draw the tree diagram from ICLUST |
Graphical functions that do not require Rgraphviz
diagram | A general set of diagram functions. |
structure.diagram | Draw a sem or regression graph |
fa.diagram | Draw the factor structure from a factor or principal components analysis |
omega.diagram | Draw the factor structure from an omega analysis(either with or without the Schmid Leiman transformation) |
ICLUST.diagram | Draw the tree diagram from ICLUST |
plot.psych | A call to plot various types of output (e.g. from irt.fa, fa, omega, iclust |
cor.plot | A heat map display of correlations |
scatterHist | Bivariate scatter plot and histograms |
spider | Spider and radar plots (circular displays of correlations) |
Circular statistics (for circadian data analysis)
circadian.cor | Find the correlation with e.g., mood and time of day |
circadian.linear.cor | Correlate a circular value with a linear value |
circadian.mean | Find the circular mean of each column of a a data set |
cosinor | Find the best fitting phase angle for a circular data set |
Miscellaneous functions
comorbidity | Convert base rate and comorbity to phi, Yule and tetrachoric |
df2latex | Convert a data.frame or matrix to a LaTeX table |
dummy.code | Convert categorical data to dummy codes |
fisherz | Apply the Fisher r to z transform |
fisherz2r | Apply the Fisher z to r transform |
ICC | Intraclass correlation coefficients |
cortest.mat | Test for equality of two matrices (see also cortest.normal, cortest.jennrich ) |
cortest.bartlett | Test whether a matrix is an identity matrix |
paired.r | Test for the difference of two paired or two independent correlations |
r.con | Confidence intervals for correlation coefficients |
r.test | Test of significance of r, differences between rs. |
p.rep | The probability of replication given a p, r, t, or F |
phi | Find the phi coefficient of correlation from a 2 x 2 table |
phi.demo | Demonstrate the problem of phi coefficients with varying cut points |
phi2poly | Given a phi coefficient, what is the polychoric correlation |
phi2poly.matrix | Given a phi coefficient, what is the polychoric correlation (works on matrices) |
polar | Convert 2 dimensional factor loadings to polar coordinates. |
scaling.fits | Compares alternative scaling solutions and gives goodness of fits |
scrub | Basic data cleaning |
tetrachor | Finds tetrachoric correlations |
thurstone | Thurstone Case V scaling |
tr | Find the trace of a square matrix |
wkappa | weighted and unweighted versions of Cohen's kappa |
Yule | Find the Yule Q coefficient of correlation |
Yule.inv | What is the two by two table that produces a Yule Q with set marginals? |
Yule2phi | What is the phi coefficient corresponding to a Yule Q with set marginals? |
Yule2tetra | Convert one or a matrix of Yule coefficients to tetrachoric coefficients. |
Functions that are under development and not recommended for casual use
irt.item.diff.rasch | IRT estimate of item difficulty with assumption that theta = 0 |
irt.person.rasch | Item Response Theory estimates of theta (ability) using a Rasch like model |
Data sets included in the psych or psychTools package
bfi | represents 25 personality items thought to represent five factors of personality |
Thurstone | 8 different data sets with a bifactor structure |
cities | The airline distances between 11 cities (used to demonstrate MDS) |
epi.bfi | 13 personality scales |
iqitems | 14 multiple choice iq items |
msq | 75 mood items |
sat.act | Self reported ACT and SAT Verbal and Quantitative scores by age and gender |
Tucker | Correlation matrix from Tucker |
galton | Galton's data set of the heights of parents and their children |
heights | Galton's data set of the relationship between height and forearm (cubit) length |
cubits | Galton's data table of height and forearm length |
peas | Galton`s data set of the diameters of 700 parent and offspring sweet peas |
vegetables | Guilford`s preference matrix of vegetables (used for thurstone) |
A debugging function that may also be used as a demonstration of psych.
test.psych | Run a test of the major functions on 5 different data sets. Primarily for development purposes. |
Although the output can be used as a demo of the various functions. |
A general guide to personality theory and research may be found at the personality-project https://personality-project.org/. See also the short guide to R at https://personality-project.org/r/. In addition, see
Revelle, W. (in preparation) An Introduction to Psychometric Theory with applications in R. Springer. at https://personality-project.org/r/book/
Revelle, W. and Condon, D.M. (2019) Reliability from alpha to omega: A tutorial. Psychological Assessment, 31, 12, 1395-1411. https://doi.org/10.1037/pas0000754. https://osf.io/preprints/psyarxiv/2y3w9/ Preprint available from PsyArxiv
#See the separate man pages and the complete index.
#to test most of the psych package run the following
#test.psych()
Run the code above in your browser using DataLab