jackstraw_MiniBatchKmeans

Test the cluster membership for K-means clustering

Test for association between the observed data and their estimated latent variables. The jackstraw package provides a resampling strategy and testing scheme to estimate statistical significance of association between the observed data and their latent variables. Depending on the data type and the analysis aim, the latent variables may be estimated by principal component analysis (PCA), factor analysis (FA), K-means clustering, and related unsupervised learning algorithms. The jackstraw methods learn over-fitting characteristics inherent in this circular analysis, where the observed data are used to estimate the latent variables and used again to test against that estimated latent variables. When latent variables are estimated by PCA, the jackstraw enables statistical testing for association between observed variables and latent variables, as estimated by low-dimensional principal components (PCs). This essentially leads to identifying variables that are significantly associated with PCs. Similarly, unsupervised clustering, such as K-means clustering, partition around medoids (PAM), and others, finds coherent groups in high-dimensional data. The jackstraw estimates statistical significance of cluster membership, by testing association between data and cluster centers. Clustering membership can be improved by using the resulting jackstraw p-values and posterior inclusion probabilities (PIPs), with an application to unsupervised evaluation of cell identities in single cell RNA-seq (scRNA-seq).

Neo Chung

jackstraw

Statistical Inference for Unsupervised Learning

Wei Hao

Alejandro Ochoa

jackstraw_MiniBatchKmeans function

<dl><dt>dat</dt>
<dd>a data matrix with <code>m</code> rows as variables and <code>n</code> columns as observations.</dd>
<dt>MiniBatchKmeans.output</dt>
<dd>an output from applying <code>ClusterR::MiniBatchKmeans()</code> onto <code>dat</code>. This provides more controls over the algorithm and subsequently the initial centroids used.</dd>
<dt>s</dt>
<dd>a number of ``synthetic'' null variables. Out of <code>m</code> variables, <code>s</code> variables are independently permuted.</dd>
<dt>B</dt>
<dd>a number of resampling iterations.</dd>
<dt>center</dt>
<dd>a logical specifying to center the rows. By default, <code>TRUE</code>.</dd>
<dt>covariate</dt>
<dd>a model matrix of covariates with <code>n</code> observations. Must include an intercept in the first column.</dd>
<dt>verbose</dt>
<dd>a logical specifying to print the computational progress. By default, <code>FALSE</code>.</dd>
<dt>batch_size</dt>
<dd>the size of the mini batches.</dd>
<dt>initializer</dt>
<dd>the method of initialization. By default, <code>kmeans++</code>.</dd>
<dt>pool</dt>
<dd>a logical specifying to pool the null statistics across all clusters. By default, <code>TRUE</code>.</dd>
<dt>...</dt>
<dd>optional arguments to control the Mini Batch K-means clustering algorithm (refers to <code>ClusterR::MiniBatchKmeans</code>).</dd></dl>

Arguments

Neo Christopher Chung <a href="/link/nchchung%40gmail.com?package=jackstraw&version=1.3.17" data-mini-rdoc="jackstraw::nchchung@gmail.com">nchchung@gmail.com</a>

Author

Non-Parametric Jackstraw for Mini Batch K-means Clustering — jackstraw_MiniBatchKmeans

<dl>

<dt>dat</dt>
<dd>a data matrix with <code>m</code> rows as variables and <code>n</code> columns as observations.</dd>


<dt>MiniBatchKmeans.output</dt>
<dd>an output from applying <code>ClusterR::MiniBatchKmeans()</code> onto <code>dat</code>. This provides more controls over the algorithm and subsequently the initial centroids used.</dd>


<dt>s</dt>
<dd>a number of ``synthetic'' null variables. Out of <code>m</code> variables, <code>s</code> variables are independently permuted.</dd>


<dt>B</dt>
<dd>a number of resampling iterations.</dd>


<dt>center</dt>
<dd>a logical specifying to center the rows. By default, <code>TRUE</code>.</dd>


<dt>covariate</dt>
<dd>a model matrix of covariates with <code>n</code> observations. Must include an intercept in the first column.</dd>


<dt>verbose</dt>
<dd>a logical specifying to print the computational progress. By default, <code>FALSE</code>.</dd>


<dt>batch_size</dt>
<dd>the size of the mini batches.</dd>


<dt>initializer</dt>
<dd>the method of initialization. By default, <code>kmeans++</code>.</dd>


<dt>pool</dt>
<dd>a logical specifying to pool the null statistics across all clusters. By default, <code>TRUE</code>.</dd>


<dt>...</dt>
<dd>optional arguments to control the Mini Batch K-means clustering algorithm (refers to <code>ClusterR::MiniBatchKmeans</code>).</dd>

</dl>

Neo Christopher Chung <a href='mailto:nchchung@gmail.com'>nchchung@gmail.com</a>

jackstraw_MiniBatchKmeans: Non-Parametric Jackstraw for Mini Batch K-means Clustering

Description

Usage

Value

Arguments

Author

Details

References

Examples