Learn R Programming

lexRankr (version 0.5.2)

bind_lexrank_: Bind lexrank scores to a dataframe of text

Description

Bind lexrank scores to a dataframe of sentences or to a dataframe of tokens with sentence ids

Usage

bind_lexrank_(tbl, text, doc_id, sent_id = NULL, level = c("sentences",
  "tokens"), threshold = 0.2, usePageRank = TRUE, damping = 0.85,
  continuous = FALSE, ...)

bind_lexrank(tbl, text, doc_id, sent_id = NULL, level = c("sentences", "tokens"), threshold = 0.2, usePageRank = TRUE, damping = 0.85, continuous = FALSE, ...)

Arguments

tbl

dataframe containing column of sentences to be lexranked

text

name of column containing sentences or tokens to be lexranked

doc_id

name of column containing document ids corresponding to text

sent_id

Only needed if level is "tokens". name of column containing sentence ids corresponding to text

level

the parsed level of the text column to be lexranked. i.e. is text a column of "sentences" or "tokens"? The "tokens" level is provided to allow users to implement custom tokenization. Note: even if the input level is "tokens" lexrank scores are assigned at the sentence level.

threshold

The minimum simililarity value a sentence pair must have to be represented in the graph where lexRank is calculated.

usePageRank

TRUE or FALSE indicating whether or not to use the page rank algorithm for ranking sentences. If FALSE, a sentences unweighted centrality will be used as the rank. Defaults to TRUE.

damping

The damping factor to be passed to page rank algorithm. Ignored if usePageRank is FALSE.

continuous

TRUE or FALSE indicating whether or not to use continuous LexRank. Only applies if usePageRank==TRUE. If TRUE, threshold will be ignored and lexRank will be computed using a weighted graph representation of the sentences. Defaults to FALSE.

...

tokenizing options to be passed to lexRankr::tokenize. Ignored if level is "sentences"

Value

A dataframe with an additional column of lexrank scores (column is given name lexrank)

Examples

Run this code
# NOT RUN {
df <- data.frame(doc_id = 1:3, 
                 text = c("Testing the system. Second sentence for you.", 
                          "System testing the tidy documents df.", 
                          "Documents will be parsed and lexranked."),
                 stringsAsFactors = FALSE)

# }
# NOT RUN {
library(magrittr)

df %>% 
  unnest_sentences(sents, text) %>% 
  bind_lexrank(sents, doc_id, level = "sentences")

df %>% 
  unnest_sentences(sents, text) %>% 
  bind_lexrank_("sents", "doc_id", level = "sentences")

df <- data.frame(doc_id  = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
                             2, 2, 2, 3, 3, 3, 3, 3, 3), 
                 sent_id = c(1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 
                             1, 1, 1, 1, 1, 1, 1, 1, 1), 
                 tokens = c("testing", "the", "system", "second", 
                            "sentence", "for", "you", "system", 
                            "testing", "the", "tidy", "documents", 
                            "df", "documents", "will", "be", "parsed", 
                            "and", "lexranked"),
                 stringsAsFactors = FALSE)

df %>% 
  bind_lexrank(tokens, doc_id, sent_id, level = 'tokens')
# }

Run the code above in your browser using DataLab