profLinear(formula, data, group, clust, param, method="agglomerative", maxiter=1000, crit=1e-6, verbose=FALSE, sampler=FALSE)
formula
is evaluatedgroup
) must have the same clust
value.alpha
, a0
, b0
, m0
, and s0
corresponding to the prior parameters of the normal-gamma Dirichlet process mixture. The prior parameters of the normal-gamma Dirichlet process mixture should all be scalars except m0
which should be a vector of length equal to the number of columns in x
."stochastic"
, "gibbs"
, "agglomerative"
, and "none"
.
"stochastic"
method is an iterative stochastic search utilizing `explode' and `merge' operations on the clusters of a partition. At the explode step, a randomly selected subset of observations are redistributed uniformly at random to an existing or new cluster. Each of the exploded observations are then merged with an existing cluster in a sequentially optimal fashion. Optimization involves computing a moving average of the relative change in the marginal posterior distribution over the possible clusters after each iteration. The optimization stopping criterion is the minumim value this quantity can take before stopping the optimization cycle. If the optimization cycle reaches the maximum allowable iterations before meeting the stopping criterion, a warning is issued."gibbs"
method implements the Polya urn Gibbs sampler. This method draws samples from the posterior distribution over the cluster partition in a sequential Gibbs fashion. The sample value with greatest posterior mass is returned. See MacEachern(1994) for details."agglomerative"
method initially places each observation into seperate clusters. At each iteration, two of the remaining clusters are merged, where the merged clusters are chosen such that the resulting increase in the posterior mass function is maximized. This is repeated until only one cluster remains. The MAP estimate is the cluster partition, among those considered, which maximizes the posterior mass function over the possible cluster partitions. See Ward (1963) for additional details. "fast"
method is a modified version of Sequential Update and Greedy Search (SUGS). This SUGS algorithm assigns observations to clusters sequentially. Initally, the first observation forms a singleton cluster. Subsequent observations are assigned to an existing cluster, or form a new singleton cluster by optimizing the associated posterior probabilities, conditional on previous cluster assignments. See Wang and Dunson (2010) for additional details."none"
method is typically used in conjunction with the clust
option to specify an initial cluster partition. If "none"
is specified without clust
, a simple algorithm is used to initialize the cluster partition. Otherwise, the cluster partition is initialized using the clust
argument. The posterior statistics are then computed for initialized clusters.
"stochastic"
and "gibbs"
optimization methods.profLinear
containing the following objects
NA
) are removedNA
) are removedNA
) are removedclust
model.frame
Missing observations (NA
) are removed automatically and a warning is issued.
Ward, J. H. (1963) Heirarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association 58:236-244 MacEachern, S. N. (1994) Estimating Normal Means with Conjugate Style Dirichlet Process Prior. Communications in Statistics B 23:727-741
pci
library(profdpm)
set.seed(42)
# set up some data
# linear model 0
x0 <- rnorm(50, 0, 3)
y0 <- x0 + rnorm(50, 0, 1)
# linear model 1
x1 <- rnorm(50, 0, 3)
y1 <- 10 - x1 + rnorm(50, 0, 1)
# add a column of ones to the covariate matrix (intercept)
dat <- data.frame(x=c(x0, x1), y=c(y0,y1))
# indicate grouping within each linear model
grp <- c(rep(seq(0,4),10), rep(seq(5,9),10))
# fit the DPMLM
fit <- profLinear(y ~ x, data=dat, group=grp)
# plot the resulting fit(s)
plot(dat$x, dat$y, xlab='x', ylab='y')
for(i in 1:length(fit$m)) {
abline( a=fit$m[[i]][1], b=fit$m[[i]][2] )
}
Run the code above in your browser using DataLab