Learn R Programming

stoRy (version 0.2.2)

get_story_clusters: Find clusters of similar stories

Description

[Maturing]

get_story_clusters classifies the stories in a collection according to thematic similarity.

Usage

get_story_clusters(
  collection = NULL,
  weights = list(choice = 3, major = 2, minor = 1),
  explicit = TRUE,
  min_freq = 1,
  min_size = 3,
  blacklist = NULL
)

Value

Returns a tibble with r rows (story clusters) and 4 columns:

cluster_id:Story cluster integer ID
stories:A tibble of stories comprising the cluster
themes:A tibble of themes common to the clustered stories
size:Number of stories in the cluster

Arguments

collection

A Collection() class object.

If NULL, the collection of all stories in the actively loaded LTO version is used.

weights

A list assigning nonnegative weights to choice, major, and minor theme levels. The default weighting list(choice = 3, major = 2, minor = 1) counts each choice usage three times, each major theme usage twice, and each minor theme usage once. Use the uniform weighting list(choice = 1, major = 1, minor = 1) weights theme usages equally regardless of level. At least one weight must be positive.

explicit

Set to FALSE to include ancestor themes of the explicit thematic annotations.

min_freq

Drop themes occurring less than this number of times from the analysis. The default min_freq=1 results in no themes are discarded.

min_size

Minimum cluster size. The default is min_size=3.

blacklist

A Themeset() class object. A themeset containing themes to be dropped from the analysis.

If NULL, no themes are filtered.

Details

The input collection of n stories, \(S[1], \ldots, S[n]\), is represented as a weighted bag-of-words, where each choice theme in story \(S[j] (j=1, \ldots, n)\) is counted weights$choice times, each major theme weights$major times, and each minor theme weights$choice times.

The function classifies the stories according to thematic similarity using the Iterative Signature Algorithm (ISA) biclustering algorithm as implemented in the isa2 R package. The clusters are "soft" meaning that a story can appear in multiple clusters.

Install isa2 package by running the command install.packages(\"isa2\") before calling this function.

References

Gábor Csárdi, Zoltán Kutalik, Sven Bergmann (2010). Modular analysis of gene expression data with R. Bioinformatics, 26, 1376-7.

Sven Bergmann, Jan Ihmels, Naama Barkai (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E, 67, 031902.

Gábor Csárdi (2017). isa2: The Iterative Signature Algorithm. R package version 0.3.5. https://cran.r-project.org/package=isa2

Examples

Run this code
if (FALSE) {
# Cluster "The Twilight Zone" franchise stories according to thematic
# similarity:
library(dplyr)
set_lto("demo")
set.seed(123)
result_tbl <- get_story_clusters()
result_tbl

# Explore a cluster of stories related to traveling back in time:
cluster_id <- 3
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

# Explore a cluster of stories related to mass panics:
cluster_id <- 5
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

# Explore a cluster of stories related to executions:
cluster_id <- 7
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

# Explore a cluster of stories related to space aliens:
cluster_id <- 10
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

# Explore a cluster of stories related to old people wanting to be young:
cluster_id <- 11
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]

# Explore a cluster of stories related to wish making:
cluster_id <- 13
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
}

Run the code above in your browser using DataLab