Computes cosine values between sets of sentences and/or documents
multidocs(x,y=x,chars=10,tvectors=tvectors,remove.punctuation=TRUE,
stopwords = NULL,method ="Add")
A list of three elements:
cosmat
A numeric matrix giving the cosines between the input sentences/documents
xdocs
A legend for the row.names of cosmat
ydocs
A legend for the col.names of cosmat
a character vector containing different sentences/documents
a character vector containing different sentences/documents (y = x
by default)
an integer specifying how many letters (starting from the first) of each sentence/document are to be printed in the row.names and col.names of the output matrix
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
removes punctuation from x
and y
; TRUE
by default
a character vector defining a list of words that are not used to compute the document/sentence vector for x
and y
the compositional model to compute the document vector from its word vectors. The default option method = "Add"
computes the document vector as the vector sum. With method = "Multiply"
, the document vector is computed via element-wise multiplication (see compose
).
Fritz Guenther
In the traditional LSA approach, the vector D for a document (or a sentence) consisting of the words (t1, . , tn) is computed as
$$D = \sum\limits_{i=1}^n t_n$$
This is the default method (method="Add"
) for this function. Alternatively, this function provided the possibility of computing the document vector from its word vectors using element-wise multiplication (see Mitchell & Lapata, 2010 and compose
).
This function computes the cosines between two sets of documents (or sentences).
The format of x
(or y
) should be of the kind x <- c("this is the first text","here is another text")
(or y <- c("this is a third text","and here is yet another text"))
A note will be displayed whenever not all words of one input string are found in the semantic space. Caution: In that case, the function will still produce a result, by omitting the words not found in the semantic space. Depending on the specific requirements of a task, this may compromise the results. Please check your input when you receive this message.
A warning message will be displayed whenever no word of one input string is found in the semantic space.
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
Dennis, S. (2007). How to use the LSA Web Site. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis (pp. 35-56). Mahwah, NJ: Erlbaum.
Mitchell, J., & Lapata, M. (2010). Composition in Distributional Models of Semantics. Cognitive Science, 34, 1388-1429.
cosine
,
Cosine
,
multicos
,
costring
data(wonderland)
multidocs(x = c("alice was beginning to get very tired.",
"the red queen greeted alice."),
y = c("the mad hatter and the mare hare are having a party.",
"the hatter sliced the cup of tea in half."),
tvectors=wonderland)
Run the code above in your browser using DataLab