.prior_flowClust1d: Elicits data-driven priors from a flowSet object for a specified channel

Description

We elicit data-driven prior parameters from a flowSet object for a specified channel. For each sample in the flowSet object, we apply a kernel-density estimator (KDE) and identify its local maxima (peaks). We then aggregate these peaks to elicit a prior parameters for each of K mixture components.

Usage

.prior_flowClust1d(flow_set, channel, K = NULL, hclust_height = NULL,
  clust_method = c("kmeans", "hclust"), hclust_method = "complete",
  artificial = NULL, nu0 = 4, w0 = 10, adjust = 2, min = -200,
  max = NULL, vague = TRUE)

Arguments

flow_set

a flowSet object

channel

the channel in the flowSet from which we elicit the prior parameters for the Student's t mixture

the number of mixture components to identify. By default, this value is NULL and determined automatically

hclust_height

the height of the hclust tree of peaks, where the should be cut By default, we use the median of the distances between adjacent peaks. If a value is specified, we pass it directly to cutree.

clust_method

the method used to cluster peaks together when for prior elicitation. By default, kmeans is used. However, if K is not specified, hclust will be used instead.

hclust_method

the agglomeration method used in the hierarchical clustering. This value is passed directly to hclust. Default is complete linkage.

artificial

a numeric vector containing prior means for artificial mixture components. The remaining prior parameters for the artificial components are copied directly from the most informative prior component elicited. If NULL (default), no artificial prior components are added.

nu0

prior degrees of freedom of the Student's t mixture components.

the number of prior pseudocounts of the Student's t mixture components.

adjust

the bandwidth to use in the kernel density estimation. See density for more information.

min

a numeric value that sets the lower bound for data filtering. If NULL (default), no truncation is applied.

max

a numeric value that sets the upper bound for data filtering. If NULL (default), no truncation is applied.

vague

logical Whether to elicit a vague prior. If TRUE, we first calculate the median of standard deviations from all flowFrames. Then, we divide the overall standard deviation by the number of groups to the scale the standard deviation.

Value

list of prior parameters

Details

Here, we outline the approach used for prior elicitation. First, we apply a KDE to each sample and extract all of its peaks (local maxima). It is important to note that different samples may have a different number of peaks. Our goal then is to align the peaks before aggregating the information across all samples. To do this, we utilize a technique similar to the peak probability contrasts (PPC) method from Tibshirani et al (2004). Effectively, we apply hierarchical clustering to the peaks from all samples to find clusters of peaks. We compute the sample mean and variance of the peaks within each cluster to elicit the prior means and its hyperprior variance, respectively, for a flowClust mixture component. We elicit the prior variance for each mixture component by first assigning the observations within each sample to the nearest prior mean. Then, we compute the variance of the observations within each cluster. Finally, we average the variances corresponding to each mixture component across all samples in the flowSet object.

Following Tibshirani et al. (2004), we cluster the peaks from each sample using complete-linkage hierarchical clustering. The linkage type can be changed via the hclust_method argument. This argument is passed directly to hclust.

To cluster the peaks, we must cut the hierarchical tree by selecting either a value for K or by providing a height of the tree to cut. By default, we cut the tree using as the height the median of the distances between adjacent peaks within each sample. This value can be changed via the hclust_height argument and, if provided, will be passed to cutree. Also, by default, the number of mixture components K is NULL and is ignored. However, if K is provided, then it has priority over hclust_height and is passed instead directly to cutree.

To ensure that the KDEs are smooth, we recommend that the bandwidth set in the adjust argument be sufficiently large. We have defaulted this value to 2. If the bandwidth is not large enough, the KDE may contain numerous bumps, resulting in erroneous peaks.

References

Tibshirani, R et al. (2004), "Sample classification from protein mass spectrometry, by 'peak probability contrasts'," Bioinformatics, 20, 17, 3034-3044. http://bioinformatics.oxfordjournals.org/content/20/17/3034.