Learn R Programming

Logistic PCA

logisticPCA is an R package for dimensionality reduction of binary data. Please note that it is still in the very early stages of development and the conventions will possibly change in the future. A manuscript describing logistic PCA can be found here.

Installation

To install R, visit r-project.org/.

The package can be installed by downloading from CRAN.

install.packages("logisticPCA")

To install the development version, first install devtools from CRAN. Then run the following commands.

# install.packages("devtools")
library("devtools")
install_github("andland/logisticPCA")

Classes

Three types of dimensionality reduction are given. For all the functions, the user must supply the desired dimension k. The data must be an n x d matrix comprised of binary variables (i.e. all 0's and 1's).

Logistic PCA

logisticPCA() estimates the natural parameters of a Bernoulli distribution in a lower dimensional space. This is done by projecting the natural parameters from the saturated model. A rank-k projection matrix, or equivalently a d x k orthogonal matrix U, is solved for to minimize the Bernoulli deviance. Since the natural parameters from the saturated model are either negative or positive infinity, an additional tuning parameter m is needed to approximate them. You can use cv.lpca() to select m by cross validation. Typical values are in the range of 3 to 10.

mu is a main effects vector of length d and U is the d x k loadings matrix.

Logistic SVD

logisticSVD() estimates the natural parameters by a matrix factorization. mu is a main effects vector of length d, B is the d x k loadings matrix, and A is the n x k principal component score matrix.

Convex Logistic PCA

convexLogisticPCA() relaxes the problem of solving for a projection matrix to solving for a matrix in the k-dimensional Fantope, which is the convex hull of rank-k projection matrices. This has the advantage that the global minimum can be obtained efficiently. The disadvantage is that the k-dimensional Fantope solution may have a rank much larger than k, which reduces interpretability. It is also necessary to specify m in this function.

mu is a main effects vector of length d, H is the d x d Fantope matrix, and U is the d x k loadings matrix, which are the first k eigenvectors of H.

Methods

Each of the classes has associated methods to make data analysis easier.

  • print(): Prints a summary of the fitted model.
  • fitted(): Fits the low dimensional matrix of either natural parameters or probabilities.
  • predict(): Predicts the PCs on new data. Can also predict the low dimensional matrix of natural parameters or probabilities on new data.
  • plot(): Either plots the deviance trace, the first two PC loadings, or the first two PC scores using the package ggplot2.

In addition, there are functions for performing cross validation.

  • cv.lpca(), cv.lsvd(), cv.clpca(): Run cross validation over the rows of the matrix to assess the fit of m and/or k.
  • plot.cv(): Plots the results of the cv() method.

Copy Link

Version

Install

install.packages('logisticPCA')

Monthly Downloads

190

Version

0.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Last Published

March 14th, 2016

Functions in logisticPCA (0.2)

plot.cv.lpca

Plot CV for logistic PCA
log_like_Bernoulli

Bernoulli Log Likelihood
house_votes84

United States Congressional Voting Records 1984
logisticPCA

Logistic Principal Component Analysis
plot.lsvd

Plot logistic SVD
cv.lpca

CV for logistic PCA
predict.lsvd

Predict Logistic SVD left singular values or reconstruction on new data
cv.lsvd

CV for logistic SVD
convexLogisticPCA

Convex Logistic Principal Component Analysis
fitted.lpca

Fitted values using logistic PCA
cv.clpca

CV for convex logistic PCA
plot.clpca

Plot convex logistic PCA
predict.lpca

Predict Logistic PCA scores or reconstruction on new data
plot.lpca

Plot logistic PCA
logisticPCA-package

logisticPCA-package
inv.logit.mat

Inverse logit for matrices
project.Fantope

Project onto the Fantope
logisticSVD

Logistic Singular Value Decomposition
predict.clpca

Predict Convex Logistic PCA scores or reconstruction on new data
fitted.lsvd

Fitted values using logistic SVD