These functions fit different variants of Gaussian mixture models. These variants differ in the fraction of knowledge utilized into the the fitting procedure.
belief(X, knowns, B = NULL, k = ifelse(!is.null(B), ncol(B),
ifelse(!is.null(P), ncol(P), length(unique(class)))), P = NULL,
class = map(B), init.params = init.model.params(X, knowns,
B = B, P = P, class = class, k = k), model.structure = getModelStructure(),
stop.likelihood.change = 10^-5, stop.max.nsteps = 100, trace = FALSE,
b.min = 0.025,
all.possible.permutations=FALSE, pca.dim.reduction = NA)
soft(X, knowns, P = NULL, k = ifelse(!is.null(P), ncol(P),
ifelse(!is.null(B), ncol(B), length(unique(class)))), B = NULL,
class = NULL, init.params = init.model.params(X, knowns,
class = class, B = P, k = k),
model.structure = getModelStructure(), stop.likelihood.change = 10^-5,
stop.max.nsteps = 100, trace = FALSE, b.min = 0.025,
all.possible.permutations=FALSE, pca.dim.reduction = NA, ...)
semisupervised(X, knowns, class = NULL, k = ifelse(!is.null(class),
length(unique(class)), ifelse(!is.null(B), ncol(B), ncol(P))),
B = NULL, P = NULL, ..., init.params = NULL,
all.possible.permutations=FALSE, pca.dim.reduction = NA)
supervised(knowns, class = NULL, k = length(unique(class)), B = NULL, P = NULL,
model.structure = getModelStructure(), ...)unsupervised(X, k, init.params=init.model.params(X, knowns=NULL, k=k),
model.structure=getModelStructure(), stop.likelihood.change=10^-5,
stop.max.nsteps=100, trace=FALSE, ...)
a data.frame with the unlabeled observations. The rows correspond to the observations while the columns to variables/dimensions of the data.
a data.frame with the labeled observations. The rows correspond to the observations while the columns to variables/dimensions of the data.
a beliefs matrix which specifies the distribution of beliefs for the labeled observations. The number of rows in B should equal the number of rows in the data.frame knowns
. It is assumed that both the observations in B
and in knowns
are given in the same order. Columns correspond to the model components. If matrix B is provided, the number of columns has to be less or equal k
. Internally, the matrix B
is completed to k
columns.
a matrix of plausibilities, i.e., weights of the prior probabilities for the labeled observations. If matrix P
is provided, the number of columns has to be less or equal k
. The came conditions as for B
apply.
a vector of classes/labels for the labeled observations. The number of its unique values has to be less or equal k
.
a number of components, by default equal to the number of columns of B
.
initial values for the estimates of the model parameters (means, variances and mixing proportions), by default derived with the
use of the init.model.params
function.
the parameters for the EM algorithms defining the stop criteria, i.e., the minimum required improvement of loglikelihood and the maximum number of steps.
if trace=TRUE
the loglikelihoods for every step of EM algorithm are printed out.
an object returned by the getModelStructure
function, which specifies constraints for the parameters of the model to be fitted.
this argument is passed to the init.model.params
function.
these arguments will be passed tothe init.model.params
function.
If equal TRUE
, all possible initial parameters' permutations of components are considered. Since there is kList! permutations, model fitting is repeated kList! times. As a result, only the model with the highest likelihood is returned.
Since the fitting for high dimensional space is numerically a bad idea an attempt to PCA will be performed if pca.dim.reduction !- FALSE
. If equal NA
then the target dimension is data driven, if it's a number then this will be the target dimension.
An object of the class mModel
, with the following slots:
a vector with the fitted mixing proportions
a matrix with the means' vectors, fitted for all components
a three-dimensional matrix with the covariance matrices, fitted for all components
the unlabeled observations
the labeled observations
the beliefs matrix
the number of all observations
the number of the unlabeled observations
the number of fitted model components
the data dimension
the log-likelihood of the fitted model
the number of steps performed by the EM algorithm
the set of constraints kept during the fitting process.
In the belief()
function, if the argument B
is not provided, it is
by default initialized from the argument P
. If the argument P
is not
provided, B
is derived from the class
argument with the use of the function get.simple.beliefs()
which assigns 1-(k-1)*b.min
to the component given by class
and
b.min
to all remaining components.
In the soft()
function, if the argument P
is not provided, it is
by default initialized from the argument B
. If the argument B
is not
provided, P
is derived from the class
argument as in the belief()
function.
In the supervised()
function, if the argument class
is not provided,
it is by default initialized from argument B
or P
, taking the label of each
observation as its most believed or plausible component (by the MAP rule).
The number of columns in the beliefs matrix B
or in the matrix of
plausibilities P
may be smaller than the number of model components
defined by the argument k
. Such situation corresponds to the scenario
when the user does not know any examples for some component. In other words, this component is not used as a label for
any observation, and thus can be omitted from the beliefs matrix. An
equivalent would be to include a column for this component and fill it
with beliefs/plausibilities equal 0.
Slots in the returned object are listed in section Value.
The returned object differs slighty with respect to the used function. Namely, the belief()
function returns an object with the slot B
. The function soft()
returns an object with a slot P
, while the functions supervised()
and semisupervised()
return objects with a slot class
instead.
The object returned by the function supervised()
does not have the slot X
.
Przemyslaw Biecek, Ewa Szczurek, Martin Vingron, Jerzy Tiuryn (2012), The R Package bgmm: Mixture Modeling with Uncertain Knowledge, Journal of Statistical Software.
# NOT RUN {
data(genotypes)
modelSupervised = supervised(knowns=genotypes$knowns,
class=genotypes$labels)
plot(modelSupervised)
modelSemiSupervised = semisupervised(X=genotypes$X,
knowns=genotypes$knowns, class = genotypes$labels)
plot(modelSemiSupervised)
modelBelief = belief(X=genotypes$X,
knowns=genotypes$knowns, B=genotypes$B)
plot(modelBelief)
modelSoft = soft(X=genotypes$X,
knowns=genotypes$knowns, P=genotypes$B)
plot(modelSoft)
modelUnSupervised = unsupervised(X=genotypes$X, k=3)
plot(modelUnSupervised)
# }
Run the code above in your browser using DataLab