sfa: Sparse factor analysis for mixed binary and count data.

Description

Scaling mixed binary and count data while estimating the underlying latent dimensionality.

Usage

sfa(M, missing.mat=NULL, gibbs=100, burnin=100, max.optim=50,  thin=1, save.curr="UDV_curr", save.each=FALSE, thin.save=25,  maxdim=NULL)

Arguments

Matrix to be scaled.

missing.mat

Matrix indicating missing data. Should be the same size as M, with a 1 denoting a missing observation and a 0 otherwise. Defaults to all zeroes.

gibbs

Number of posterior samples to draw

burnin

Number of burnin samples.

max.optim

Number of iterations to fit the cutpoints using optim. This is generally faster than the Hamiltonian Monte Carlo estimates, and is useful for the first part of the burnin phase.

thin

Extent of thinning of the MCMC chain. Only every thin draw is saved to the output.

save.curr

Name of file in which to save object.

save.each

Whether to save with a new name at each thinned draw.

thin.save

How many thinned draws to wait between saving output.

maxdim

Number of latent dimensions to fit. Should be greater than the number of estimated dimensions.

Value

dim.sparse: Output for sparse estimates of dimensionality.
dim.mean: Non-sparse estimates of posterior mean of dimensionality.
rowdim1: Posterior samples of first dimension of spatial locations for each observation i.
rowdim2: Posterior samples of second dimension of spatial locations for row unit of observation.
coldim1: Posterior samples of first dimension of spatial locations for column unit of observation.
coldim2: Posterior samples of second dimension of spatial locations for column unit of observation.
lambda.lasso: Posterior samples for tuning parameter used for dimension selection.
Z: Posterior mean of fitted values, on a z-scale.
rowdims.all: Posterior mean of all row spatial locations.
coldims.all: Posterior mean of all column spatial locations.

Details

The function sfa is the main function in the package, SparseFactorAnalysis. It takes in a matrix which in rows has the same data type--either binary or count. For example, every row may consist of roll call votes or word counts, and the columns may correspond with legislators. The method combines the two data types, scales both, and selects the underlying latent dimensionality.

References

In Song Kim, John Londregan, and Marc Ratkovic. 2015. "Voting, Speechmaking, and the Dimensions of Conflict in the US Senate." Working paper.

Examples

Run this code


## Not run: 
# ##Sample size and dimensions.
#  set.seed(1)
#  n.sim<-50
#  k.sim<-500
#  
# ##True vector of dimension weights.
#  d.sim<-rep(0,n.sim)
#  d.sim[1:3]<-c(2, 1.5, 1)*3
# 
# ##Formulate true latent dimensions.
#  U.sim<-matrix(rnorm(n.sim^2,sd=.5), nr=n.sim, nc=n.sim)
#  V.sim<-matrix(rnorm(n.sim*k.sim,sd=.5), nr=k.sim, nc=n.sim)
#  Theta.sim<-U.sim%*%diag(d.sim)%*%t(V.sim)
# 
# ##Generate binary outcome and count data.
#  probs.sim<-pnorm((-1+Theta.sim+rep(1,n.sim)%*%t(rnorm(k.sim,sd=.5)) + 
#    rnorm(n.sim,sd=.5)%*%t(rep(1,k.sim))   ))
#  votes.mat<- 
#     apply(probs.sim[1:25,],c(1,2),FUN=function(x) rbinom(1,1,x))
#  count.mat<- 
#     apply(probs.sim[26:50, ],c(1,2),FUN=function(x) rpois(1,20*x))
#  M<-rbind(votes.mat,count.mat)
#  
# ## Run sfa
#  sparse1<-sfa(M, maxdim=10)
#  
# ##Analyze results.
#  summary(sparse1)
#  plot(sparse1,type="dim")
#  plot(sparse1,type="scatter")
# 
# ##Compare to true data generating process
# 
# plot(sparse1$Z,Theta.sim)
# abline(c(0,1))
# 
# ## End(Not run)

Run the code above in your browser using DataLab