Learn R Programming

multiway (version 1.0-2)

parafac: Parallel Factor Analysis-1

Description

Given a 3-way array X = array(x,dim=c(I,J,K)), the 3-way Parafac model can be written as c{ X[i,j,k] = sum A[i,r]*B[j,r]*C[k,r] + E[i,j,k] } where A = matrix(a,I,R) are the Mode A (first mode) weights, B = matrix(b,J,R) are the Mode B (second mode) weights, C = matrix(c,K,R) are the Mode C (third mode) weights, and E = array(e,dim=c(I,J,K)) is the 3-way residual array. The summation is for r = seq(1,R).

Given a 4-way array X = array(x,dim=c(I,J,K,L)), the 4-way Parafac model can be written as c{ X[i,j,k,l] = sum A[i,r]*B[j,r]*C[k,r]*D[l,r] + E[i,j,k,l] } where D = matrix(d,L,R) are the Mode D (fourth mode) weights, E = array(e,dim=c(I,J,K,L)) is the 4-way residual array, and the other terms can be interprered as previously described.

Weight matrices are estimated using an alternating least squares algorithm with optional constraints.

Usage

parafac(X,nfac,nstart=10,const=NULL,
        Bfixed=NULL,Cfixed=NULL,Dfixed=NULL,
        Bstart=NULL,Cstart=NULL,Dstart=NULL,
        Bstruc=NULL,Cstruc=NULL,Dstruc=NULL,
        maxit=500,ctol=10^-4,parallel=FALSE,
        cl=NULL,output=c("best","all"))

Arguments

X
Three-way data array with dim=c(I,J,K) or four-way data array with dim=c(I,J,K,L).
nfac
Number of factors.
nstart
Number of random starts.
const
Constraints for each mode. See Examples.
Bfixed
Fixed Mode B weights. Only used to fit model with fixed Mode B weights.
Cfixed
Fixed Mode C weights. Only used to fit model with fixed Mode C weights.
Dfixed
Fixed Mode D weights. Only used to fit model with fixed Mode D weights.
Bstart
Starting Mode B weights for ALS algorithm. Default uses random weights.
Cstart
Starting Mode C weights for ALS algorithm. Default uses random weights.
Dstart
Starting Mode D weights for ALS algorithm. Default uses random weights.
Bstruc
Structure constraints for Mode B weights. Default uses unstructured weights.
Cstruc
Structure constraints for Mode C weights. Default uses unstructured weights.
Dstruc
Structure constraints for Mode D weights. Default uses unstructured weights.
maxit
Maximum number of iterations.
ctol
Convergence tolerance (R^2 change).
parallel
Logical indicating if parLapply should be used. See Examples.
cl
Cluster created by makeCluster. Only used when parallel=TRUE.
output
Output the best solution (default) or output all nstart solutions.

Value

  • If output="best", returns an object of class "parafac" with the following elements:
  • AMode A weight matrix.
  • BMode B weight matrix.
  • CMode C weight matrix.
  • DMode D weight matrix.
  • RsqR-squared value.
  • GCVGeneralized Cross-Validation.
  • edfEffective degrees of freedom.
  • iterNumber of iterations.
  • cflagConvergence flag.
  • constSame as input const.
  • Otherwise returns a list of length nstart where each element is an object of class "parafac".

Warnings

The ALS algorithm can perform poorly if the number of factors nfac is set too large.

Non-negativity constraints can be sensitive to local optima.

Non-negativity constraints can result in slower performance.

Structure constraints for override constraints in const input.

References

Bro, R., & De Jong, S. (1997). A fast non-negativity-constrained least squares algorithm. Journal of Chemometrics, 11, 393-401.

Harshman, R. A. (1970). Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multimodal factor analysis. UCLA Working Papers in Phonetics, 16, 1-84.

Harshman, R. A., & Lundy, M. E. (1994). PARAFAC: Parallel factor analysis. Computational Statistics and Data Analysis, 18, 39-72.

Examples

Run this code
##########   3-way example   ##########

# create random data array with Parafac structure
set.seed(3)
mydim <- c(50,20,5)
nf <- 3
Amat <- matrix(rnorm(mydim[1]*nf),mydim[1],nf)
Bmat <- matrix(runif(mydim[2]*nf),mydim[2],nf)
Cmat <- matrix(runif(mydim[3]*nf),mydim[3],nf)
Xmat <- array(tcrossprod(Amat,krprod(Cmat,Bmat)),dim=mydim)
Emat <- array(rnorm(prod(mydim)),dim=mydim)
Emat <- nscale(Emat,0,sumsq(Xmat))   # SNR=1
X <- Xmat + Emat

# fit Parafac model (unconstrained)
pfac <- parafac(X,nfac=nf,nstart=1)
pfac$Rsq

# check solution
Xhat <- fitted(pfac)
sum((Xmat-Xhat)^2)/prod(mydim)

# reorder and resign factors
pfac$B[1:4,]
pfac <- reorder(pfac, c(3,1,2))
pfac$B[1:4,]
pfac <- resign(pfac, mode="B")
pfac$B[1:4,]
Xhat <- fitted(pfac)
sum((Xmat-Xhat)^2)/prod(mydim)

# rescale factors
colSums(pfac$B^2)
colSums(pfac$C^2)
pfac <- rescale(pfac, mode="C", absorb="B")
colSums(pfac$B^2)
colSums(pfac$C^2)
Xhat <- fitted(pfac)
sum((Xmat-Xhat)^2)/prod(mydim)


##########   4-way example   ##########

# create random data array with Parafac structure
set.seed(4)
mydim <- c(30,10,8,10)
nf <- 4
Amat <- matrix(rnorm(mydim[1]*nf),mydim[1],nf)
Bmat <- svd(matrix(runif(mydim[2]*nf),mydim[2],nf))$u
Cmat <- matrix(runif(mydim[3]*nf),mydim[3],nf)
Dmat <- matrix(runif(mydim[4]*nf),mydim[4],nf)
Xmat <- array(tcrossprod(Amat,krprod(Dmat,krprod(Cmat,Bmat))),dim=mydim)
Emat <- array(rnorm(prod(mydim)),dim=mydim)
Emat <- nscale(Emat,0,sumsq(Xmat))   # SNR=1
X <- Xmat + Emat

# fit Parafac model (unconstrained A, orthogonal B, non-negative C, unconstrained D)
pfac <- parafac(X,nfac=nf,nstart=1,const=c(0,1,2,0))
pfac$Rsq

# check solution
Xhat <- fitted(pfac)
sum((Xmat-Xhat)^2)/prod(mydim)
crossprod(pfac$B)
pfac$C


##########   parallel computation   ##########

# create random data array with Parafac structure
set.seed(3)
mydim <- c(50,20,5)
nf <- 3
Amat <- matrix(rnorm(mydim[1]*nf),mydim[1],nf)
Bmat <- matrix(runif(mydim[2]*nf),mydim[2],nf)
Cmat <- matrix(runif(mydim[3]*nf),mydim[3],nf)
Xmat <- array(tcrossprod(Amat,krprod(Cmat,Bmat)),dim=mydim)
Emat <- array(rnorm(prod(mydim)),dim=mydim)
Emat <- nscale(Emat,0,sumsq(Xmat))   # SNR=1
X <- Xmat + Emat

# fit Parafac model (10 random starts -- sequential computation)
set.seed(1)
system.time({pfac <- parafac(X,nfac=nf)})
pfac$Rsq

# fit Parafac model (10 random starts -- parallel computation)
set.seed(1)
cl <- makeCluster(detectCores())
ce <- clusterEvalQ(cl,library(multiway))
system.time({pfac <- parafac(X,nfac=nf,parallel=TRUE,cl=cl)})
pfac$Rsq
stopCluster(cl)

Run the code above in your browser using DataLab