findThreshold:
Find appropriate threshold range
Description
This function performs a grid search over potential clustering thresholds to identify a valid range, and inspect the varying levels of aggregation within it.
Usage
findThreshold(mod, documents_raw=NULL, documents_matrix=NULL, range_min=.05, range_max=5, step=.05)
Arguments
mod
A fitted STM
object from stm.
documents_raw
The raw documents used to generate the STM model. A character vector where each entry is the full text of a document.
documents_matrix
Document-term matrix representation of the raw documents, as generated by the prepDocuments
function.
range_min
Lower bound of the range to be searched.
range_max
Upper bound of the range to be searched.
step
Step size for the grid search.
Value
A data frame containing the following columns:
- threshold: Threshold value.
- valid: Binary value; 1 if clustering is successful using given threshold; 0 if not.
- juncture_points: Number of juncture points in the resulting clustering tree; -1 if run is unsuccessful. Lower threshold values yield a higher number of juncture points, corresponding to more binary splits and deeper trees. Higher threshold values produce fewer juncture points, corresponding to trees that have significant breadth rather than depth.