Learn R Programming

dmt (version 0.8.20)

fit.dependency.model: Fit dependency model between two data sets.

Description

Fit generative latent variable model (see vignette for model specification) on two data sets. Regularize the solutions with priors, including constraints on marginal covariance structures, the structure of W, latent dimensionality etc. Probabilistic versions of PCA, factor analysis and CCA are available as special cases.

Usage

fit.dependency.model(X, Y, zDimension = 1, marginalCovariances = "full", epsilon = 1e-3, priors = list(), matched = TRUE, includeData = TRUE, calculateZ = TRUE, verbose = FALSE) ppca(X, Y = NULL, zDimension = NULL, includeData = TRUE, calculateZ = TRUE) pfa(X, Y = NULL, zDimension = NULL, includeData = TRUE, calculateZ = TRUE, priors = NULL) pcca(X, Y, zDimension = NULL, includeData = TRUE, calculateZ = TRUE)

Arguments

X, Y
Data set/s X and Y. 'Variables x samples'. The second data set (Y) is optional.
zDimension
Dimensionality of the shared latent variable.
marginalCovariances
Structure of marginal covariances, assuming multivariate Gaussian distributions for the dataset-specific effects. Options: "identical isotropic", "isotropic", "diagonal" and "full". The difference between isotropic and identical isotropic options is that in isotropic model, phi$X != phi$Y in general, whereas with isotropic model phi$X = phi$Y.
epsilon
Convergence limit.
priors
Prior parameters for the model. A list, which can contain some of the following elements:
W
Rate parameter for exponential distribution (should be positive). Used to specify the prior for Wx and Wy in the dependency model. The exponential prior is used to produce non-negative solutions for W; small values of the rate parameter correspond to an uninformative prior distribution.

Nm.wxwy.mean
Mean of the matrix normal prior distribution for the transformation matrix T. Must be a matrix of size (variables in first data set) x (variables in second data set). If value is 1, Nm.wxwy.mean will be made identity matrix of appropriate size.

Nm.wxwy.sigma
Variance parameter for the matrix normal prior distribution of the transformation matrix T. Described the allowed deviation scale of the transformation matrix T from the mean matrix Nm.wxwy.mean.

matched
Logical indicating if the variables (dimensions) are matched between X and Y. Applicable only when dimX = dimY. Affects the results only when prior on the relationship Wx ~ Wy is set, i.e. when priors$Nm.wx.wy.sigma < Inf.
includeData
Logical indicating whether the original data is included to the model output. Using FALSE can be used to save memory.
calculateZ
Logical indicating whether an expectation of the latent variable Z is included in the model output. Otherwise the expectation can be calculated with getZ or z.expectation. Using FALSE speeds up the calculation of the dependency model.
verbose
Follow procedure by intermediate messages.

Value

DependencyModel

Details

The fit.dependency.model function fits the dependency model X = N(W$X * Z, phi$X); Y = N(W$Y * Z, phi$Y) with the possibility to tune the model structure and parameter priors.

In particular, the dataset-specific covariance structure phi can be defined; non-negative priors for W are possible; the relation between W$X and W$Y can be tuned. For a comprehensive set of examples, see the example scripts in the tests/ directory of this package.

Special cases of the model, obtained with particular prior assumptions, include probabilistic canonical correlation analysis (pcca; Bach & Jordan 2005), probabilistic principal component analysis (ppca; Tipping & Bishop 1999), probabilistic factor analysis (pfa; Rubin & Thayer 1982), and a regularized version of canonical correlation analysis (pSimCCA; Lahti et al. 2009). The standard probabilistic PCA and factor analysis are methods for a single data set (X ~ N(WZ, phi)), with isotropic and diagonal covariance (phi) for pPCA and pFA, respectively. Analogous models for two data sets are obtained by concatenating the two data sets, and performing pPCA or pFA.

Such special cases are obtained with the following choices in the fit.dependency.model function:

pPCA
marginalCovariances = "identical isotropic" (Tipping & Bishop 1999)

pFA
marginalCovariances = "diagonal" (Rubin & Thayer 1982)

pCCA
marginalCovariances = "full" (Bach & Jordan 2005)

pSimCCA
marginaCovariances = "full", priors = list(Nm.wxwy.mean = I, Nm.wxwy.sigma = 0). This is the default method, corresponds to the case with W$X = W$Y. (Lahti et al. 2009)

pSimCCA with T prior
marginalCovariances = "isotropic", priors = list(Nm.wxwy.mean = 1, Nm.wx.wy.sigma = 1 (Lahti et al. 2009)

To avoid computational singularities, the covariance matrix phi is regularised by adding a small constant to the diagonal.

References

Dependency Detection with Similarity Constraints, Lahti et al., 2009 Proc. MLSP'09 IEEE International Workshop on Machine Learning for Signal Processing, http://arxiv.org/abs/1101.5919

A Probabilistic Interpretation of Canonical Correlation Analysis, Bach Francis R. and Jordan Michael I. 2005 Technical Report 688. Department of Statistics, University of California, Berkley. http://www.di.ens.fr/~fbach/probacca.pdf

Probabilistic Principal Component Analysis, Tipping Michael E. and Bishop Christopher M. 1999. Journal of the Royal Statistical Society, Series B, 61, Part 3, pp. 611--622. http://research.microsoft.com/en-us/um/people/cmbishop/downloads/Bishop-PPCA-JRSS.pdf

EM Algorithms for ML Factorial Analysis, Rubin D. and Thayer D. 1982. Psychometrika, vol. 47, no. 1.

See Also

Output class for this function: DependencyModel. Special cases: ppca, pfa, pcca

Examples

Run this code
data(modelData) # Load example data X, Y

# probabilistic CCA
model <- pcca(X, Y)

# dependency model with priors (W>=0; Wx = Wy; full marginal covariances)
model <- fit.dependency.model(X, Y, zDimension = 1, 
      	 		      priors = list(W = 1e-3, Nm.wx.wy.sigma = 0), 
			      marginalCovariances = "full")

# Getting the latent variable Z when it has been calculated with the model
#getZ(model)

Run the code above in your browser using DataLab