2D or 3D-Plot of mutual word similarities to a given list of sentences/documents
plot_doclist(x,connect.lines="all",method="PCA",dims=3,
axes=F,box=F,cex=1,chars=10,legend=T, size = c(800,800),
alpha="graded",alpha.grade=1,col="rainbow",
tvectors=tvectors,remove.punctuation=TRUE,...)
see plot3d
: this function is called for the side effect of drawing the plot; a vector of object IDs is returned.
plot_doclist
further prints a list with two elements:
the coordinate vectors of the sentences/documents in the plot as a data frame
A legend for the sentence/document labels in the plot and in the coordinates
a character vector of length(x) > 1
that contains multiple sentences/documents
the dimensionality of the plot; set either dims = 2
or dims = 3
the method to be applied; either a Principal Component Analysis (method="PCA"
) or a Multidimensional Scaling (method="MDS"
)
(3d plot only) the number of closest associate words each word is connected with via line. Setting connect.lines="all"
(default) will draw all connecting lines and will automatically apply alpha="graded"
(3d plot only) whether axes shall be included in the plot
(3d plot only) whether a box shall be drawn around the plot
(2d Plot only) A numerical value giving the amount by which plotting text should be magnified relative to the default.
an integer specifying how many letters (starting from the first) of each sentence/document are to be printed in the plot
(3d plot only) whether a legend shall be drawn illustrating the color scheme of the connect.lines
. The legend is inserted as a background bitmap to the plot using bgplot3d
. Therefore, they do not resize very gracefully (see the bgplot3d
documentation for more information).
(3d plot only) A numeric vector with two elements, the first specifying the width and the second specifying the height of the plot device.
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
removes punctuation from x
and y
; TRUE
by default
(3d plot only) A numeric vector specifying the luminance of the connect.lines
. By setting alpha="graded"
, the luminance of every line will be adjusted to the cosine between the two words it connects.
(3d plot only) Only relevant if alpha="graded"
. Specify a numeric value for alpha.grade
to scale the luminance of all connect.lines
up (alpha.grade
> 1) or down (alpha.grade
< 1) by that factor.
(3d plot only) A vector specifying the color of the connect.lines
. With setting col ="rainbow"
(default), the color of every line will be adjusted to the cosine between the two words it connects, according to the rainbow palette. Other available color palettes for this purpose are heat.colors
, terrain.colors
, topo.colors
, and cm.colors
(see rainbow
). Additionally, you can customize any color scale of your choice by providing an input specifying more than one color (for example col = c("black","blue","red")
).
additional arguments which will be passed to plot3d
(in a three-dimensional plot only)
Fritz Guenther, Taylor Fedechko
Computes all pairwise similarities within a given list of sentences/documents. On this similarity matrix, a Principal Component Analysis (PCA) or a Multidimensional Sclaing (MDS) is applied to get a two- or three-dimensional solution that best captures the similarity structure. This solution is then plotted.
In the traditional LSA approach, the vector D for a document (or a sentence) consisting of the words (t1, . , tn) is computed as $$D = \sum\limits_{i=1}^n t_n$$ This function then computes the the cosines between two sets of documents (or sentences).
The format of x
should be of the kind x <- c("this is the first text","here is another text")
For creating pretty plots showing the similarity structure within this list of words best, set connect.lines="all"
and col="rainbow"
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
Mardia, K.V., Kent, J.T., & Bibby, J.M. (1979). Multivariate Analysis, London: Academic Press.
data(wonderland)
## Standard Plot
docs <- c("alice was beginning to get very tired.",
"the red queen greeted alice.",
"the mad hatter and the mare hare are having a party.",
"the hatter sliced the cup of tea in half.")
plot_doclist(docs,tvectors=wonderland,method="MDS",dims=2)
Run the code above in your browser using DataLab