clusterSamples: clusterSamples: K-means clustering on samples based on latent factors

Description

MOFA factors are continuous in nature but they can be used to predict discrete clusters of samples, similar to the iCluster model (Shen, 2009). The clustering can be performed in a single factor, which is equivalent to setting a manual threshold; or using multiple factors, where multiple sources of variation are aggregated. Importantly, this type of clustering is not weighted and does not take into account the different importance of the latent factors.

Usage

clusterSamples(object, k, factors = "all", ...)

Arguments

object

a trained MOFAmodel object.

number of clusters

factors

character vector with the factor name(s), or numeric vector with the index of the factor(s) to use. Default is 'all'

...

extra arguments passed to kmeans

Value

output from kmeans function

Details

In some cases, samples can have missing values in the factor space. This occurs when a factor is active in a single view and some samples are missing this data. In such a case, there are several strategies to follow:

Use clustering approaches that deal with NAs (not implemented in MOFA)
If the factor in question is not important, you can remove it with subsetFactors
If the factor in question is important and just a small number of samples are conflictive, you can manually set them to 0 using object@Expectations$Z[is.na(object@Expectations$Z)] <- 0

By default, the conflictive samples are ignored in the clustering procedure and NAs are returned.

Examples

Run this code

# NOT RUN {
# Example on the CLL data
filepath <- system.file("extdata", "CLL_model.hdf5", package = "MOFAdata")
MOFA_CLL <- loadModel(filepath)
# cluster samples based into 3 groups based on all factors
clusterSamples(MOFA_CLL, k=3, factors="all")
# cluster samples based into 2 groups based on factor 1
clusters <- clusterSamples(MOFA_CLL, k=2, factors=1)
# cluster can be visualized for example on the factors values:
plotFactorBeeswarm(MOFA_CLL, factor=1, color_by=clusters)

# Example on the scMT data
filepath <- system.file("extdata", "scMT_model.hdf5", package = "MOFAdata")
MOFA_scMT <- loadModel(filepath)
# cluster samples based into 2 groups based on all factor 1 and 2
clusters <- clusterSamples(MOFA_CLL, k=2, factors=1:2)
# cluster can be visualized for example on the factors values:
plotFactorScatter(MOFA_CLL, factors=1:2, color_by=clusters)
# }

Run the code above in your browser using DataLab