createWordcloud: Create Word Cloud Visualization.

Description

Wrapper around wordcloud function that optionally saves graphics to the file of one of supported formats.

Usage

createWordcloud(words, freq, title = "Wordcloud", scale = c(8, 0.2), minFreq = 10, maxWords = 40, filename, format = c("png", "bmp", "jpeg", "tiff", "pdf"), width = 480, height = 480, units = "px", palette = brewer.pal(8, "Dark2"), titleFactor = 1)

Arguments

words

the words

freq

their frequencies

title

plot title

scale

a vector indicating the range of the size of the words (default c(4,.5))

minFreq

words with frequency below minFreq will not be displayed

maxWords

Maximum number of words to be plotted (least frequent terms dropped).

filename

file name to use where to save graphics

format

format of graphics device to save wordcloud image

width

the width of the output graphics device

height

the height of the output graphics device

units

the units in which height and width are given. Cab be px (pixels, the default), in (inches), cm or mm.

palette

color words from least to most frequent

titleFactor

numeric title character expansion factor; multiplied by par("cex") yields the final title character size. NULL and NA are equivalent to a factor of 1.

Value

nothing

Details

Uses base graphics and worldcloud package to create a word cloud (tag cloud) visual reprsentation of for text data. Function uses 2 vectors of equal lengths: one contains list of words and the other has their frequencies.

Resulting graphics is saved in file in one of available graphical formats (png, bmp, jpeg, tiff, or pdf).

Word Cloud visuals apply to any concept that satisfies following conditions: * each data point (artifact) can be expressed with distinct word or compact text in distinct and self-explanatory fashion and * it assigns each artifact scalar non-negative metric. Given these two conditions we can use Word Clouds to visualize top, bottom or all artifacts in single word cloud visual.

Examples

Run this code

if(interactive()){
# initialize connection to Dallas database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")

stopwords = c("a", "an", "the", "with")

# 2-gram tf-idf on offense table
daypart_tfidf_2gram = computeTfIdf(conn, "public.dallaspoliceall", 
                                   docId="extract('hour' from offensestarttime)::int/6",  
                                   textColumns=c('offensedescription','offensenarrative'),
                                   parser=nGram(2, delimiter='[  \\t\\b\\f\\r:\"]+'),
                                   stopwords=stopwords)

toRace <- function(ch) {
  switch(as.character(ch),
         "M" = "Male",
         "F" = "Female",
         "0" = "Night",
         "1" = "Morning",
         "2" = "Day",
         "3" = "Evening",
         "C" = "C",
         "Unknown")
}
                                  
createDallasWordcloud <- function(tf_df, metric, slice, n, maxWords=25, size=750) {
  words=with(tf_df$rs, tf_df$rs[docid==slice,])
  
  ## palette 
  pal = rev(brewer.pal(8, "Set1"))[c(-3,-1)]
  
  createWordcloud(words$term, words[, metric], maxWords=maxWords, scale=c(4, 0.5), palette=pal, 
                  title=paste("Top ", metric, "Offense", n, "- grams for", toRace(race)),
                  file=paste0('wordclouds/',metric,'_offense_',n,'gram_',toRace(slice),'.png'), 
                  width=size, height=size)
}

createDallasWordcloud(daypart_tfidf_2gram, 'tf_idf', 0, n=2, maxWords=200, size=1300)

}