Learn R Programming

superml (version 0.1.0)

TfIdfVectorizer: TfIDF(Term Frequency Inverse Document Frequency) Vectorizer

Description

It aims to provide a standardized way of creating TF-IDF features just like python's sklearn library. It also consists of fit, transform methods (similar to sklearn) to make it easier for you switch between R and Python.

Usage

TfIdfVectorizer

Arguments

Format

R6Class object.

Usage

For usage details see Methods, Arguments and Examples sections.

tf_object = TfIdfVectorizer$new(max_df, min_df, max_features, smooth_idf)
tf_object$fit(sentences)
tf_matrix = tf_object$transform(sentences)
tf_matrix = tf_object$fit_transform(sentences) ## alternate

Methods

$new()

Initialise the instance of the vectorizer

$fit()

creates a memory of count vectorizers but doesn't return anything

$transform()

based on encodings learned in fit method, return the tf-idf matrix

$fit_transform()

returns tf-idf matrix

Examples

Run this code
# NOT RUN {
df <- data.table::data.table(sents = c('i am alone in dark.',
                           'mother_mary a lot',
                           'alone in the dark?',
                           'many mothers in the lot....'))
tf <- TfIdfVectorizer$new(smooth_idf = TRUE, min_df = 0.3)
tf_features <- tf$fit_transform(df$sents)
# }

Run the code above in your browser using DataLab