LatentSemanticAnalysis: Latent Semantic Analysis model

Description

Creates LSA(Latent semantic analysis) model. See https://en.wikipedia.org/wiki/Latent_semantic_analysis for details.

Usage

LatentSemanticAnalysis
LSA

Format

R6Class object.

Usage

For usage details see Methods, Arguments and Examples sections.

lsa = LatentSemanticAnalysis$new(n_topics, method = c("randomized", "irlba"))
lsa$fit_transform(x, ...)
lsa$transform(x, ...)
lsa$components

Methods

$new(n_topics): create LSA model with n_topics latent topics
$fit_transform(x, ...): fit model to an input sparse matrix (preferably in dgCMatrix format) and then transform x to latent space
$transform(x, ...): transform new data x to latent space

Arguments

lsa: A LSA object.
x: An input document-term matrix. Preferably in dgCMatrix format
n_topics: integer desired number of latent topics.
method: character, one of c("randomized", "irlba"). Defines underlying SVD algorithm. For very large data "randomized" usually works faster and more accurate.
...: Arguments to internal functions. Notably useful for fit_transform() - these arguments will be passed to irlba or svdr functions which are used as backend for SVD.

Examples

Run this code

# NOT RUN {
data("movie_review")
N = 100
tokens = movie_review$review[1:N] %>% tolower %>% word_tokenizer
dtm = create_dtm(itoken(tokens), hash_vectorizer())
n_topics = 10
lsa_1 = LatentSemanticAnalysis$new(n_topics)
d1 = lsa_1$fit_transform(dtm)
# the same, but wrapped with S3 methods
d2 = fit_transform(dtm, lsa_1)

# }

Run the code above in your browser using DataLab