clusterData: Cluster Data Based on Different Methods

Description

Cluster Data Based on Different Methods

Usage

clusterData(
  obj = NULL,
  scaleData = TRUE,
  cluster.method = c("mfuzz", "TCseq", "kmeans", "wgcna"),
  TCseq_params_list = list(),
  object = NULL,
  min.std = 0,
  cluster.num = NULL,
  subcluster = NULL,
  seed = 5201314,
  ...
)

Value

A list containing the following clustering results:

wide.res: A wide-format data frame with clusters and normalized expression levels.
long.res: A long-format data frame for visualizations, containing cluster information, normalized values, cluster names, and memberships.
cluster.list: A list where each element contains genes belonging to a specific cluster.
type: The clustering method used ("mfuzz", "TCseq", "kmeans", or "wgcna").
geneMode: Currently set to "none" (reserved for future use).
geneType: Currently set to "none" (reserved for future use).

Arguments

obj: An input object that can take one of two types: - A cell_data_set object for trajectory analysis. - A matrix or data.frame containing expression data.
scaleData: Logical. Whether to scale the data (e.g., z-score normalization).
cluster.method: Character. Clustering method to use. Options are one of "mfuzz", "TCseq", "kmeans", or "wgcna".
TCseq_params_list: A list of additional parameters passed to the TCseq::timeclust function.
object: A pre-calculated object required when using "wgcna" as the clustering method.
min.std: Numeric. Minimum standard deviation for filtering expression data.
cluster.num: Integer. The number of clusters to identify.
subcluster: A numeric vector of specific cluster IDs to include in the results. If NULL, all clusters are included.
seed: An integer seed for reproducibility in clustering operations.
...: Additional arguments passed to internal functions such as pre_pseudotime_matrix.

WGCNA Clustering

If the WGCNA method is selected, the object parameter must contain a pre-calculated WGCNA network object. This is typically obtained using the WGCNA package functions.

Subsetting Clusters

Use the subcluster parameter to focus on specific clusters. Cluster IDs not included in the subcluster vector will be excluded from the final results.

Author

JunZhang

This function performs clustering on input data using one of four methods: mfuzz, TCseq, kmeans, or wgcna. The clustering results include metadata, normalized data, and cluster memberships.

Details

Depending on the selected cluster.method, different clustering algorithms are used:

"mfuzz": Applies Mfuzz soft clustering method, suitable for identifying overlapping clusters.
"TCseq": Uses TCseq clustering for time-series expression data with support for additional parameters.
"kmeans": Employs standard k-means clustering via base R's stats::kmeans.
"wgcna": Leverages pre-calculated WGCNA (Weighted Gene Co-expression Network Analysis) networks.

The function is designed to be flexible, allowing preprocessing (e.g., filtering by min.std), scaling the data (scaleData = TRUE), and generating results compatible with data visualization pipelines.

Examples

Run this code


data("exps")

# kmeans
ck <- clusterData(obj = exps,
                  cluster.method = "kmeans",
                  cluster.num = 8)

Run the code above in your browser using DataLab