matrixpls calculates composite variable models using partial least squares (PLS) algorithm and related methods. In contrast to most other PLS software which implement the raw data version of the algorithm, matrixpls works with data covariance matrices. The algorithms are designed to be computationally efficient, modular in programming, and well documented. matrixpls integrates with simsem to enable Monte Carlo simulations with as little custom programming as possible.
matrixpls calculates models where sets of indicator variables are combined as weighted composites. These composites are then used to estimate a statistical model describing the relationships between the composites and composites and indicators. While a number of such methods exists, the partial least squares (PLS) technique is perhaps the most widely used.
The matrixpls package implements a collection of PLS techniques as well as the more recent GSCA and PLSc techniques and older methods based on analysis with composite variables, such as regression with unit weighted composites or factor scores. The package provides a unified framework that enables the comparison and analysis of these algorithms. In contrast to previous R packages for PLS, such as plspm and semPLS and all currently available commercial PLS software, which work with raw data, matrixpls calculates the indicator weights and model estimates from data covariance matrices. Working with covariance data allows for reanalyzing covariance matrices that are sometimes published as appendices of articles, is computationally more efficient, and lends itself more easily for formal analysis than implementations based on raw data.
matrixpls has modular design that is easily expanded and contains more calculation options than the two other PLS packages for R. To allow validation of the algorithms by end users and to help porting existing analysis files from the two other R packages to matrixpls, the package contains compatibility functions for both plspm and semPLS.
The design principles and functionality of the package is best explained by first explaining the main
function matrixpls
. The function performs two tasks. It first calculates a set of indicator
weights to form composites based on data covariance matrix and then estimates a statistical model
with the indicators and composites using the weights. The main function takes the following arguments:
matrixpls(S, model, W.model = NULL, weightFun = weightFun.pls, parameterEstim = parameterEstim.separate, weightSign = NULL, ..., validateInput = TRUE, standardize = TRUE)
The first five arguments of matrixpls
are most relevant for understanding how the package
works. S
, is the data covariance or correlation matrix. model
defines the model
which is estimated in the second stage and W.model
defines how the indicators are to be
aggregated as composites. If W.model
is left undefined, it will be constructed based on
model
following rules that are explained elsewhere in the documentation.
weightFun
and
parameterEstim
are functions that
implement the first and second task of the function respectively. All other arguments are passed
down to these two functions, which in turn can pass arguments to other functions that they call.
Many of the commonly used arguments of matrixpls
function are functions themselves. For
example, executing a PLS analysis with Mode B outer estimation for all indicator blocks and centroid inner
estimation could be specified as follows:
matrixpls(S, model, outerEstim = outerEstim.modeB, innerEstim = innerEstim.centroid)
The arguments outerEstim
and innerEstim
are not defined by the
matrixpls
function, but are passed down to weightFun.pls
which is used as the default
weightFun
. outerEstim.modeB
and innerEstim.centroid
are themselves functions provided
by the matrixpls package, which perform the actual inner and outer estimation stages of the
PLS algorithm. Essentially, all parts of the estimation algorithm can be provided as arguments for
the main function. This allows for adjusting the inner workings of the algorithm in a way that is
currently not possible with any other PLS software.
It is also possible to define custom functions. For example, we could define a new Mode B outer estimator that only produces positive weights by creating a custom function:
myModeB <- function(...){ abs(outerEstim.ModeB(...)) }matrixpls(S, model, outerEstim = myModeB, innerEstim = innerEstim.centroid)
Model can be specified in the lavaan format or the native matrixpls format.
The native model format is a list of three binary matrices, inner
, reflective
,
and formative
specifying the free parameters of a model: inner
(l x l
) specifies the
regressions between composites, reflective
(k x l
) specifies the regressions of observed
data on composites, and formative
(l x k
) specifies the regressions of composites on the
observed data. Here k
is the number of observed variables and l
is the number of composites.
If the model is specified in lavaan format, the native
format model is derived from this model by assigning all regressions between latent
variables to inner
, all factor loadings to reflective
, and all regressions
of latent variables on observed variables to formative
. Regressions between
observed variables and all free covariances are ignored. All parameters that are
specified in the model will be treated as free parameters.
The original papers about Partial Least Squares, as well as many of the current PLS
implementations, impose restrictions on the matrices inner
,
reflective
, and formative
: inner
must be a lower triangular matrix,
reflective
must have exactly one non-zero value on each row and must have at least
one non-zero value on each column, and formative
must only contain zeros.
Some PLS implementations allow formative
to contain non-zero values, but impose a
restriction that the sum of reflective
and t(formative)
must satisfy
the original restrictions of reflective
. The only restrictions that matrixpls
imposes on inner
, reflective
, and formative
is that these must be
binary matrices and that the diagonal of inner
must be zeros.
Rosseel, Y. (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1<U+2013>36. Retrieved from http://www.jstatsoft.org/v48/i02
Lohm<U+00F6>ller J.-B. (1989) Latent variable path modeling with partial least squares. Heidelberg: Physica-Verlag.
R<U+00F6>nkk<U+00F6>, M., McIntosh, C. N., & Antonakis, J. (2015). On the adoption of partial least squares in psychological research: Caveat emptor. Personality and Individual Differences, (87), 76<U+2013>84. 10.1016/j.paid.2015.07.019DOI:10.1016/j.paid.2015.07.019
Wold, H. (1982). Soft modeling - The Basic Design And Some Extensions. In K. G. J<U+00F6>reskog & S. Wold (Eds.),Systems under indirect observation: causality, structure, prediction (pp. 1<U+2013>54). Amsterdam: North-Holland.
Useful links: